Conditional join in r -


i conditionally join 2 data tables together:

library(data.table) set.seed(1)  key.table <-    data.table(     out = (0:10)/10,     keyz = sort(runif(11))   )  large.tbl <-    data.table(     ab = rnorm(1e6),     cd = runif(1e6)   ) 

according following rule: match smallest value of out in key.table keyz value larger cd. have following:

library(dplyr) large.tbl %>%   rowwise %>%   mutate(out = min(key.table$out[key.table$keyz > cd])) 

which provides correct output. problem have rowwise operation seems expensive large.tbl using, crashing unless on particular computer. there less memory-expensive operations? following seems faster, not enough problem have.

large.tbl %>%     group_by(cd) %>%     mutate(out = min(key.table$out[key.table$keyz > cd])) 

this smells problem data.table answer, answer not have use package.

what want is:

setkey(large.tbl, cd) setkey(key.table, keyz) key.table[large.tbl, roll = -inf] 

see ?data.table>roll:

applies last join column, date can ordered variable, irregular , including gaps. if roll=true , i's row matches last x join column, , value in last i join column falls in gap (including after last observation in x group), prevailing value in x rolled forward. operation particularly fast using modified binary search. operation known last observation carried forward (locf). usually, there should no duplicates in x's key, last key column date (or time, or datetime) , columns of x's key joined to. common idiom select contemporaneous regular time series (dts) across set of identifiers (ids): dt[cj(ids,dts),roll=true] dt has 2-column key (id,date) , cj stands cross join. when roll positive number, limits how far values carried forward. roll=true equivalent roll=+inf. when roll negative number, values rolled backwards; i.e., next observation carried backwards (nocb). use -inf unlimited roll back. when roll "nearest", nearest value joined to.

(to fair think go elucidation, it's pretty dense)


Comments

Popular posts from this blog

qt - Using float or double for own QML classes -

Create Outlook appointment via C# .Net -

ios - Swift Array Resetting Itself -