R - parse unaligned XML attribute to data frame -


i have xml file structure.

<?xml version="1.0" encoding="utf-8"?> <b>     <c name="foo" stuff="89" attr="first line&#xa;second line"/>     <c name="bar" id="ontime" stuff="23" attr="blahs"/>     <c id="delay" name="dog"  newattr="clahs"/>     ... </b> 

as can see attribute quite messy; missing values , unaligned. convert following data frame (or other table-like structure) in r language further analysis.

╔══════════╦══════════════╦══════════════════════════════════╦════════════════╦═════════╗ ║   name   ║ stuff        ║ attr                             ║ id             ║ newattr ║ ╠══════════╬══════════════╬══════════════════════════════════╬════════════════╬═════════╣ ║ 1 foo    ║  89          ║ "first line&#xa;second line"     ║ na             ║  na     ║ ║ 2 bar    ║  23          ║ "blahs"                          ║ "ontime"       ║  na     ║ ║ 3 dog    ║  na          ║      na                          ║ "delay"        ║ "clahs" ║ ╚══════════╩══════════════╩══════════════════════════════════╩════════════════╩═════════╝ 

i have failed miserably due limited r , parsing experience. have feeling xapplysapply may work, couldn't figure out how set path.

another technique explore code identify new attributes itself. in other words, no attribute's name hard-coded in code. example, when sees line 3, automatically add new column data frame , name "newattr".

thank help.

------------------- added on july 18, 2015 -----------------------

here brute force approach. there better way since it's super slow (6 hours handle single ~250mb xml on modern personal laptop).

myxmltodataframe2 <- function(file) {   xl <- xmltolist(xmlparse(file))   xl <- unname(xl)    # initialize data frame   df <- data.frame(t(xl[[1]]), stringsasfactors = false)    number_of_attribute <- length(df)   number_of_row <- length(xl)    (i in 2:number_of_row) {     # examine each element in new row     (j in 1:length(xl[[i]])) {       df[i,attributes(xl[[i]])$names[j]] <- xl[[i]][[j]]         }   }   df } 

we need complete example. na data problematic fill.

here's started:

library(xml)  xml <- '<b> <c name="foo" stuff="89" attr="first line&#xa;second line"/> <c name="bar" id="ontime" stuff="23" attr="blahs"/> <c id="delay" name="dog"  attr="clahs"/> </b>'  xml <- xmlparse(xml)  attr_vals <- unlist(xpathapply(xmlparse(xml), "//b/c/@attr")) stuff_vals <- unlist(xpathapply(xmlparse(xml), "//b/c/@stuff")) ids_vals <- unlist(xpathapply(xmlparse(xml), "//b/c/@id")) 

Comments

Popular posts from this blog

qt - Using float or double for own QML classes -

Create Outlook appointment via C# .Net -

ios - Swift Array Resetting Itself -