How to retrieve more than 100000 rows from Redshift using R and dplyr -
i'm analyzing data redshift database, working in r using connection per dplyr - works:
my_db<-src_postgres(host='my-cluster-blahblah.redshift.amazonaws.com', port='5439', dbname='dev',user='me', password='mypw') mytable <- tbl(my_db, "mytable") viewstation<-mytable %>% filter(stationname=="something")
when try turn output data frame, so:
thisdata<-data.frame(viewstation)
i error message, warning message:
only first 100,000 results retrieved. use n = -1 retrieve all.
where supposed set n?
instead of using
thisdata<-data.frame(viewstation)
use
thisdata <- collect(viewstation)
collect() pull data database r. mentioned in dplyr::databases vignette:
when working databases, dplyr tries lazy possible. it’s lazy in 2 ways:
it never pulls data r unless explicitly ask it.
it delays doing work until last possible minute, collecting want sending database in 1 step.
Comments
Post a Comment