scala - inferSchema in spark-csv package -


when csv read dataframe in spark, columns read string. there way actual type of column?

i have following csv file

name,department,years_of_experience,dob sam,software,5,1990-10-10 alex,data analytics,3,1992-10-10 

i've read csv using below code

val df = sqlcontext.                   read.                   format("com.databricks.spark.csv").                   option("header", "true").                   option("inferschema", "true").                   load(sampleaddatas3location) df.schema 

all columns read string. expect column years_of_experience read int , dob read date

please note i've set option inferschema true.

i using latest version (1.0.3) of spark-csv package

am missing here?

2015-07-30

the latest version 1.1.0, doesn't matter since looks inferschema is not included in latest release.

2015-08-17

the latest version of package 1.2.0 (published on 2015-08-06) , schema inference works expected:

scala> df.printschema root  |-- name: string (nullable = true)  |-- department: string (nullable = true)  |-- years_of_experience: integer (nullable = true)  |-- dob: string (nullable = true) 

regarding automatic date parsing doubt ever happen, or @ least not without providing additional metadata.

even if fields follow date-like format impossible if given field should interpreted date. either lack of out automatic date inference or spreadsheet mess. not mention issues timezones example.

finally can parse date string manually:

sqlcontext   .sql("select *, date(dob) dob_d  df")   .drop("dob")   .printschema  root  |-- name: string (nullable = true)  |-- department: string (nullable = true)  |-- years_of_experience: integer (nullable = true)  |-- dob_d: date (nullable = true) 

so not serious issue.


Comments

Popular posts from this blog

qt - Using float or double for own QML classes -

Create Outlook appointment via C# .Net -

ios - Swift Array Resetting Itself -