python - convert categorial variables into integers using pandas -
i trying convert categorical variables integers. however, want them use same key (a gets converted 1 across fields. below code not use same keys.
import pandas pd df1 = pd.dataframe({'a' : ['a', 'a', 'c', 'd','b']}) df2 = pd.dataframe({'a' : ['d', 'd', 'b', 'a','a']}) df1_int = pd.factorize(df1['a'])[0] print df1_int df2_int = pd.factorize(df2['a'])[0] print df2_int
this output get:
[0 0 1 2 3] [0 0 1 2 2]
as you're trying learn categories 1 dataframe apply different dataframe, using scikit-learn might provide more elegant solution:
from sklearn import preprocessing import pandas pd df1 = pd.dataframe({'a' : ['a', 'a', 'c', 'd','b'], 'b' : ['one', 'one', 'two', 'three','four']}) df2 = pd.dataframe({'a' : ['d', 'd', 'b', 'a','a'], 'b' : ['one', 'five', 'two', 'three','four']}) le = preprocessing.labelencoder() df1_int = le.fit_transform(df1['a']) print df1_int df2_int = le.transform(df2['a']) print df2_int
results in:
[0 0 2 3 1] [3 3 1 0 0]
Comments
Post a Comment