How to retrieve more than 100000 rows from Redshift using R and dplyr -


i'm analyzing data redshift database, working in r using connection per dplyr - works:

my_db<-src_postgres(host='my-cluster-blahblah.redshift.amazonaws.com', port='5439', dbname='dev',user='me', password='mypw') mytable <- tbl(my_db, "mytable")  viewstation<-mytable %>%     filter(stationname=="something")  

when try turn output data frame, so:

thisdata<-data.frame(viewstation) 

i error message, warning message:

only first 100,000 results retrieved. use n = -1 retrieve all.  

where supposed set n?

instead of using

thisdata<-data.frame(viewstation) 

use

thisdata <- collect(viewstation) 

collect() pull data database r. mentioned in dplyr::databases vignette:

when working databases, dplyr tries lazy possible. it’s lazy in 2 ways:

it never pulls data r unless explicitly ask it.

it delays doing work until last possible minute, collecting want sending database in 1 step.


Comments

Popular posts from this blog

qt - Using float or double for own QML classes -

Create Outlook appointment via C# .Net -

ios - Swift Array Resetting Itself -