pandas - alternative to recursive sql / dedupe columns -

- April 15, 2012

hi have data set 1 many relations. want reduce 1-1 concatenating distinct values each column

here example:

i have data set called customer , product affiliation (pa). 1 customer can have multiple pa different times.

here input:

╔════════╦═══════╦══════╦══════╗ ║ cust   ║pa1    ║ pa2  ║ pa3  ║ ╠════════╬═══════╬══════╬══════╣ ║      ║ h     ║ m    ║ l    ║ ║      ║ h     ║ l    ║ m    ║ ║      ║ h     ║ m    ║ h    ║ ╚════════╩═══════╩══════╩══════╝

desired output:

╔════════╦══════╦══════════╦═══════╗ ║  cust  ║  pa1 ║   pa2    ║  pa3  ║ ╠════════╬══════╬══════════╬═══════╣ ║      ║  h   ║   m&l    ║ l&m&h ║ ╚════════╩══════╩══════════╩═══════╝

i have multiple pa columns. want know if there generic python panda code can apply thank minc

here's expanded version of sample data make more clear how code operates on groups rather whole dataframe:

df = pd.dataframe({ 'cust':list('aaabbb'),'pa1':list('hhhmmh'),                     'pa2': list('mlmmmm'),'pa3':[1,2,3,3,3,1] })    cust pa1 pa2 pa3 0      h   m   1 1      h   l   2 2      h   m   3 3    b   m   m   3 4    b   m   m   3 5    b   h   m   1

the first step create new dataframe 1 row per customer. quick , dirty way fine example @ hand might want differently depending on want other columns.

df2 = df.groupby('cust').sum()        pa1  pa2  pa3 cust                    hhh  mlm  123 b     mmh  mmm  331

with template desired output, can loop on relevant columns of original dataframe , use groupby/apply join/unique replace values in existing columns:

for col in ['pa1','pa2','pa3']:     df2[col] = df.groupby('cust')[col].apply( lambda x:                                                '&'.join( x.astype(str).unique() ))        pa1  pa2    pa3 cust                        h  m&l  1&2&3 b     m&h    m    3&1

Search This Blog

Chrom

pandas - alternative to recursive sql / dedupe columns -

Comments

Post a Comment

Popular posts from this blog

qt - Using float or double for own QML classes -

json - ORA-06502: PL/SQL: numeric or value error: character string buffer too small - Convert Clob to varchar2 -

ios - Swift Array Resetting Itself -