How to retrieve more than 100000 rows from Redshift using R and dplyr -

- April 15, 2012

i'm analyzing data redshift database, working in r using connection per dplyr - works:

my_db<-src_postgres(host='my-cluster-blahblah.redshift.amazonaws.com', port='5439', dbname='dev',user='me', password='mypw') mytable <- tbl(my_db, "mytable")  viewstation<-mytable %>%     filter(stationname=="something")

when try turn output data frame, so:

thisdata<-data.frame(viewstation)

i error message, warning message:

only first 100,000 results retrieved. use n = -1 retrieve all.

where supposed set n?

instead of using

thisdata<-data.frame(viewstation)

use

thisdata <- collect(viewstation)

collect() pull data database r. mentioned in dplyr::databases vignette:

when working databases, dplyr tries lazy possible. it’s lazy in 2 ways:

it never pulls data r unless explicitly ask it.

it delays doing work until last possible minute, collecting want sending database in 1 step.

Search This Blog

Chrom

How to retrieve more than 100000 rows from Redshift using R and dplyr -

Comments

Post a Comment

Popular posts from this blog

qt - Using float or double for own QML classes -

json - ORA-06502: PL/SQL: numeric or value error: character string buffer too small - Convert Clob to varchar2 -

ios - Swift Array Resetting Itself -