csv - scikit learn - converting features stored as strings into numbers -


i'm dipping toes machine learning using scikit-learn python library , trying use .csv data in format

date        name                    average_price_sa 1995-01-01  barking , dagenham    70885.331285935 1995-01-01  barnet                  99567.4268042005 1995-01-01  barnsley                49608.33494746 .... .... .... 2005-01-01  barking , dagenham    13294.12321312 

i have read them in using panda using line

data = pd.read_csv('data.csv') 

from have learned far, think i'm supposed convert 'name' category strings floats can accepted model.

i'm not sure how go this. appreciated.

thanks

you can use scikit's labelbinarizer convert strings 1 hot vectors. these have n zeros (where n number of unique strings) 1 @ single component.

from __future__ import print_function sklearn import preprocessing  names = ["barking , dagenham", "barnet", "barnsley"] lb = preprocessing.labelbinarizer() vectors = lb.fit_transform(names) name, vector in zip(names, vectors):     print("%s => %s" % (name, str(vector))) 

output:

barking , dagenham => [1 0 0] barnet => [0 1 0] barnsley => [0 0 1] 

Comments

Popular posts from this blog

qt - Using float or double for own QML classes -

Create Outlook appointment via C# .Net -

ios - Swift Array Resetting Itself -