python - convert categorial variables into integers using pandas -


i trying convert categorical variables integers. however, want them use same key (a gets converted 1 across fields. below code not use same keys.

import pandas pd  df1 = pd.dataframe({'a' : ['a', 'a', 'c', 'd','b']})  df2 = pd.dataframe({'a' : ['d', 'd', 'b', 'a','a']})  df1_int = pd.factorize(df1['a'])[0] print df1_int  df2_int = pd.factorize(df2['a'])[0] print df2_int 

this output get:

    [0 0 1 2 3]     [0 0 1 2 2] 

as you're trying learn categories 1 dataframe apply different dataframe, using scikit-learn might provide more elegant solution:

from sklearn import preprocessing import pandas pd  df1 = pd.dataframe({'a' : ['a', 'a', 'c', 'd','b'],                     'b' : ['one', 'one', 'two', 'three','four']})  df2 = pd.dataframe({'a' : ['d', 'd', 'b', 'a','a'],                     'b' : ['one', 'five', 'two', 'three','four']})  le = preprocessing.labelencoder() df1_int = le.fit_transform(df1['a']) print df1_int  df2_int = le.transform(df2['a']) print df2_int 

results in:

[0 0 2 3 1] [3 3 1 0 0] 

Comments

Popular posts from this blog

qt - Using float or double for own QML classes -

Create Outlook appointment via C# .Net -

ios - Swift Array Resetting Itself -