R - parse unaligned XML attribute to data frame -
i have xml file structure.
<?xml version="1.0" encoding="utf-8"?> <b> <c name="foo" stuff="89" attr="first line
second line"/> <c name="bar" id="ontime" stuff="23" attr="blahs"/> <c id="delay" name="dog" newattr="clahs"/> ... </b>
as can see attribute quite messy; missing values , unaligned. convert following data frame (or other table-like structure) in r language further analysis.
╔══════════╦══════════════╦══════════════════════════════════╦════════════════╦═════════╗ ║ name ║ stuff ║ attr ║ id ║ newattr ║ ╠══════════╬══════════════╬══════════════════════════════════╬════════════════╬═════════╣ ║ 1 foo ║ 89 ║ "first line
second line" ║ na ║ na ║ ║ 2 bar ║ 23 ║ "blahs" ║ "ontime" ║ na ║ ║ 3 dog ║ na ║ na ║ "delay" ║ "clahs" ║ ╚══════════╩══════════════╩══════════════════════════════════╩════════════════╩═════════╝
i have failed miserably due limited r , parsing experience. have feeling xapplysapply
may work, couldn't figure out how set path.
another technique explore code identify new attributes itself. in other words, no attribute's name hard-coded in code. example, when sees line 3, automatically add new column data frame , name "newattr".
thank help.
------------------- added on july 18, 2015 -----------------------
here brute force approach. there better way since it's super slow (6 hours handle single ~250mb xml on modern personal laptop).
myxmltodataframe2 <- function(file) { xl <- xmltolist(xmlparse(file)) xl <- unname(xl) # initialize data frame df <- data.frame(t(xl[[1]]), stringsasfactors = false) number_of_attribute <- length(df) number_of_row <- length(xl) (i in 2:number_of_row) { # examine each element in new row (j in 1:length(xl[[i]])) { df[i,attributes(xl[[i]])$names[j]] <- xl[[i]][[j]] } } df }
we need complete example. na
data problematic fill.
here's started:
library(xml) xml <- '<b> <c name="foo" stuff="89" attr="first line
second line"/> <c name="bar" id="ontime" stuff="23" attr="blahs"/> <c id="delay" name="dog" attr="clahs"/> </b>' xml <- xmlparse(xml) attr_vals <- unlist(xpathapply(xmlparse(xml), "//b/c/@attr")) stuff_vals <- unlist(xpathapply(xmlparse(xml), "//b/c/@stuff")) ids_vals <- unlist(xpathapply(xmlparse(xml), "//b/c/@id"))
Comments
Post a Comment