csv - scikit learn - converting features stored as strings into numbers -
i'm dipping toes machine learning using scikit-learn python library , trying use .csv data in format
date name average_price_sa 1995-01-01 barking , dagenham 70885.331285935 1995-01-01 barnet 99567.4268042005 1995-01-01 barnsley 49608.33494746 .... .... .... 2005-01-01 barking , dagenham 13294.12321312
i have read them in using panda using line
data = pd.read_csv('data.csv')
from have learned far, think i'm supposed convert 'name' category strings floats can accepted model.
i'm not sure how go this. appreciated.
thanks
you can use scikit's labelbinarizer
convert strings 1 hot vectors. these have n zeros (where n number of unique strings) 1 @ single component.
from __future__ import print_function sklearn import preprocessing names = ["barking , dagenham", "barnet", "barnsley"] lb = preprocessing.labelbinarizer() vectors = lb.fit_transform(names) name, vector in zip(names, vectors): print("%s => %s" % (name, str(vector)))
output:
barking , dagenham => [1 0 0] barnet => [0 1 0] barnsley => [0 0 1]
Comments
Post a Comment