Adding a base year index to R dataframe with multiple groups -
i have yearly time series dataframe few grouping variables , need add index column based on particular year.
df <- data.frame(year = c(2000,2001,2002,2000,2001,2002), grp = c("a","a","a","b","b","b"), val = sample(6))
i want make simple index of variable val value divided value of base year, 2000:
df$val.ind <- df$val/df$val[df$year == 2000]
this not right not respect grouping variable grp. tried plyr not make work.
in actual problem have several grouping variables varying time series , i'm looking quite general solution.
we can create 'val.ind' after doing calculation within grouping variable ('grp'). can done in many ways.
one option data.table
create 'data.table' 'data.frame' (setdt(df)
), grouped 'grp', divide 'val' 'val' corresponds 'year' value of 2000.
library(data.table) setdt(df)[, val.ind := val/val[year==2000], = grp]
note: base
year bit confusing wrt result. in example, both 'a' , 'b' grp have 'year' 2000. suppose, if op meant use minimum year value (considering numeric column), val/val[year==2000]
in above code can replaced val/val[which.min(year)]
.
or can use similar code dplyr
. group 'grp' , use mutate
create 'val.ind'
library(dplyr) df %>% group_by(grp) %>% mutate(val.ind = val/val[year==2000])
here also, if needed replace val/val[year==2000]
val/val[which.min(year)]
a base r
option split/unsplit
. split
dataset 'grp' column convert data.frame
list
of dataframes, loop through list
output lapply
, create new column using transform
(or within
) , convert list
added column single data.frame
unsplit
.
unsplit(lapply(split(df, df$grp), function(x) transform(x, val.ind= val/val[year==2000])), df$grp)
note can use do.call(rbind
instead of unsplit
. but, prefer unsplit
same row order original dataset.
Comments
Post a Comment