pandas - alternative to recursive sql / dedupe columns -


hi have data set 1 many relations. want reduce 1-1 concatenating distinct values each column

here example:

i have data set called customer , product affiliation (pa). 1 customer can have multiple pa different times.

here input:

╔════════╦═══════╦══════╦══════╗ ║ cust   ║pa1    ║ pa2  ║ pa3  ║ ╠════════╬═══════╬══════╬══════╣ ║      ║ h     ║ m    ║ l    ║ ║      ║ h     ║ l    ║ m    ║ ║      ║ h     ║ m    ║ h    ║ ╚════════╩═══════╩══════╩══════╝ 

desired output:

╔════════╦══════╦══════════╦═══════╗ ║  cust  ║  pa1 ║   pa2    ║  pa3  ║ ╠════════╬══════╬══════════╬═══════╣ ║      ║  h   ║   m&l    ║ l&m&h ║ ╚════════╩══════╩══════════╩═══════╝ 

i have multiple pa columns. want know if there generic python panda code can apply thank minc

here's expanded version of sample data make more clear how code operates on groups rather whole dataframe:

df = pd.dataframe({ 'cust':list('aaabbb'),'pa1':list('hhhmmh'),                     'pa2': list('mlmmmm'),'pa3':[1,2,3,3,3,1] })    cust pa1 pa2 pa3 0      h   m   1 1      h   l   2 2      h   m   3 3    b   m   m   3 4    b   m   m   3 5    b   h   m   1 

the first step create new dataframe 1 row per customer. quick , dirty way fine example @ hand might want differently depending on want other columns.

df2 = df.groupby('cust').sum()        pa1  pa2  pa3 cust                    hhh  mlm  123 b     mmh  mmm  331 

with template desired output, can loop on relevant columns of original dataframe , use groupby/apply join/unique replace values in existing columns:

for col in ['pa1','pa2','pa3']:     df2[col] = df.groupby('cust')[col].apply( lambda x:                                                '&'.join( x.astype(str).unique() ))        pa1  pa2    pa3 cust                        h  m&l  1&2&3 b     m&h    m    3&1 

Comments

Popular posts from this blog

qt - Using float or double for own QML classes -

Create Outlook appointment via C# .Net -

ios - Swift Array Resetting Itself -