hadoop - How to query data from AmazonS3 using Apache Drill? -


after using spark saveastextfile s3 format hadoop. format of file in bucket that.

enter image description here

format year/month/date/timestamp , data in part-0000 file json format.

i config drill , point bucket name

{   "type": "file",   "enabled": true,   "connection": "s3://com.giaosudau.win-bid",   "workspaces": {     "root": {       "location": "/",       "writable": false,       "defaultinputformat": "json"     },     "tmp": {       "location": "/tmp",       "writable": true,       "defaultinputformat": null     }   },   "formats": {     "psv": {       "type": "text",       "extensions": [         "tbl"       ],       "delimiter": "|"     },     "csv": {       "type": "text",       "extensions": [         "csv"       ],       "delimiter": ","     },     "tsv": {       "type": "text",       "extensions": [         "tsv"       ],       "delimiter": "\t"     },     "parquet": {       "type": "parquet"     },     "json": {       "type": "json"     },     "avro": {       "type": "avro"     }   } } 

i select storage ok don't know select , data date or hour or file.

enter image description here

i try thing select file not success enter image description here

file sample here

{"auctionid":"xx","bidrequeststring":{"id":"xx","timestamp":"2015-07-29t08:31:00.413z","istest":false,"url":"http://www.222.3232.com/","useragent":"mozilla/5.0 (x11; u; linux i686; en-us; rv:1.8.1.7pre) gecko/20070815 firefox/2.0.0.6 navigator/9.0b3","protocolversion":"google protocol buffer","exchange":"adx","provider":"google","location":{"dma":-1,"metro":-1,"timezoneoffsetminutes":-1},"segments":{"adxdetectedverticals":["23:0.571243","355:0.409098","380:0.339647","474:0.415936","540:0.079871"]},"userids":{"prov":"bgzqawdsa2dmbmrubwxtamdoags","xchg":"bgzqawdsa2dmbmrubwxtamdoags"},"imp":[{"id":"99","banner":{"w":[220,300],"h":[600,450],"id":"99","pos":0,"topframe":0},"pmp":{"ext":{"adgroup_id":"17490739393"}},"formats":["220x600","300x450"],"position":0},{"id":"199","banner":{"w":120,"h":600,"id":"199","pos":0,"topframe":0},"pmp":{"ext":{"adgroup_id":"17490739393"}},"formats":["120x600"],"position":0}],"spots":[{"id":"99","banner":{"w":[220,300],"h":[600,450],"id":"99","pos":0,"topframe":0},"pmp":{"ext":{"adgroup_id":"17490739393"}},"formats":["220x600","300x450"],"position":0},{"id":"199","banner":{"w":120,"h":600,"id":"199","pos":0,"topframe":0},"pmp":{"ext":{"adgroup_id":"17490739393"}},"formats":["120x600"],"position":0}],"site":{"id":"502","publisher":{"id":"502"},"page":"http://www.blog.34343.com/"},"device":{"ua":"mozilla/5.0 (x11; u; linux i686; en-us; rv:1.8.1.7pre) gecko/20070815 firefox/2.0.0.6 navigator/9.0b3","ip":"109.103.101.0","geo":{"zip":" 2600"},"language":"en","ext":{"geo_criteria_id":1000142}},"user":{"id":"bgzqawdsa2dmbmrubwxtamdoags","ext":{"cookie_age_seconds":2080089}}},"bidresponsecreative":{"itemid":"fsknxfwe34235235","campaignid":"332","htmlsnippet":"\u003ciframe frameborder=0 scrolling=no width=\"300\" height=\"250\" src=\"//sv.brand-display.com/adedge/api/bd/serving/simple/frame?aukey=34343\u0026_=%cachebuster%\u0026winning_price=%winning_price%\u0026google_click_url=%click_url_esc%\u0026encrypt_value=%encrypt_value%\"\u003e\u003c/iframe\u003e","name":"expandable web","formatcode":"billboard","bd":"","status":1,"deleted":false,"destinationurl":"https://ex.sg","tagging":[1,2,3],"expandingdirection":14,"bdurl":"","format":{"code":"billboard","name":"billboard","publisher":"default","type":"standard","width":120,"height":600,"expanded_width":0,"expanded_height":0,"collapsed_width":0,"collapsed_height":0,"aspratio":3.76,"expandable":true,"expand_first":true}},"bidresponsecreativeid":"1","bidresponsecreativename":"","bidresponsedata":{"bids":[{"creative":0,"ext":null,"price":"6071usd/1m","priority":1.0,"spotindex":0},{"creative":1,"ext":null,"price":"6071usd/1m","priority":1.0,"spotindex":1}]},"bidresponsemeta":"null","bidwinmeta":"null","biddingagentname":"iface.http","biddingfullaccount":"23848834:strategy","biddingmainaccount":"1212312","biddingmaxprice":"6071usd/1m","biddingrequestformattype":"datacratic","biddingsubaccount":"strategy","impindex":"1","impressionid":"199","pricepriority":"1.000000","rawwinprice":"100usd/1m","timestamp":"2015-jul-29 08:31:00.53899","userids":{"prov":"123232","xchg":"9849839"},"winprice":"100usd/1m"} {"auctionid":"343","bidrequeststring":{"id":"344","timestamp":"2015-07-28t08:31:00.413z","istest":false,"url":"http://www.xx.xx.com/","useragent":"mozilla/5.0 (x11; u; linux i686; en-us; rv:1.8.1.7pre) gecko/20070815 firefox/2.0.0.6 navigator/9.0b3","protocolversion":"google protocol buffer","exchange":"adx","provider":"google","location":{"dma":-1,"metro":-1,"timezoneoffsetminutes":-1},"segments":{"adxdetectedverticals":["23:0.571243","355:0.409098","380:0.339647","474:0.415936","540:0.079871"]},"userids":{"prov":"bgzqawdsa2dmbmrubwxtamdoags","xchg":"bgzqawdsa2dmbmrubwxtamdoags"},"imp":[{"id":"99","banner":{"w":[220,300],"h":[600,450],"id":"99","pos":0,"topframe":0},"pmp":{"ext":{"adgroup_id":"17490739393"}},"formats":["220x600","300x450"],"position":0},{"id":"199","banner":{"w":120,"h":600,"id":"199","pos":0,"topframe":0},"pmp":{"ext":{"adgroup_id":"17490739393"}},"formats":["120x600"],"position":0}],"spots":[{"id":"99","banner":{"w":[220,300],"h":[600,450],"id":"99","pos":0,"topframe":0},"pmp":{"ext":{"adgroup_id":"17490739393"}},"formats":["220x600","300x450"],"position":0},{"id":"199","banner":{"w":120,"h":600,"id":"199","pos":0,"topframe":0},"pmp":{"ext":{"adgroup_id":"17490739393"}},"formats":["120x600"],"position":0}],"site":{"id":"502","publisher":{"id":"502"},"page":"http://www.blog.343.com/"},"device":{"ua":"mozilla/5.0 (x11; u; linux i686; en-us; rv:1.8.1.7pre) gecko/20070815 firefox/2.0.0.6 navigator/9.0b3","ip":"109.103.101.0","geo":{"zip":" 2600"},"language":"en","ext":{"geo_criteria_id":1000142}},"user":{"id":"bgzqawdsa2dmbmrubwxtamdoags","ext":{"cookie_age_seconds":2080089}}},"bidresponsecreative":{"itemid":"fsknxfwe34235235","campaignid":"342342342","htmlsnippet":"\u003ciframe frameborder=0 scrolling=no width=\"300\" height=\"250\" src=\"//sv.brand-display.com/adedge/api/bd/serving/simple/frame?aukey=xxx\u0026_=%cachebuster%\u0026winning_price=%winning_price%\u0026google_click_url=%click_url_esc%\u0026encrypt_value=%encrypt_value%\"\u003e\u003c/iframe\u003e","name":"expandable web","formatcode":"billboard","bd":"","status":1,"deleted":false,"destinationurl":"https://ex.sg","tagging":[1,2,3],"expandingdirection":14,"bdurl":"","format":{"code":"billboard","name":"billboard","publisher":"default","type":"standard","width":120,"height":600,"expanded_width":0,"expanded_height":0,"collapsed_width":0,"collapsed_height":0,"aspratio":3.76,"expandable":true,"expand_first":true}},"bidresponsecreativeid":"1","bidresponsecreativename":"","bidresponsedata":{"bids":[{"creative":0,"ext":null,"price":"6071usd/1m","priority":1.0,"spotindex":0},{"creative":1,"ext":null,"price":"6071usd/1m","priority":1.0,"spotindex":1}]},"bidresponsemeta":"null","bidwinmeta":"null","biddingagentname":"iface.http","biddingfullaccount":"829838828928932:strategy","biddingmainaccount":"3434","biddingmaxprice":"6071usd/1m","biddingrequestformattype":"datacratic","biddingsubaccount":"strategy","impindex":"1","impressionid":"199","pricepriority":"1.000000","rawwinprice":"100usd/1m","timestamp":"2015-jul-29 08:31:00.53899","userids":{"prov":"cc","xchg":"cc"},"winprice":"100usd/1m"} 

how data that? thanks.

here examples using data , local file system instead of s3. think files not have extension, if not true, if files have .json extension, can query them (use table alias t resolve ambiguity):

use dfs.`root`; +-------+---------------------------------------+ |  ok   |                summary                | +-------+---------------------------------------+ | true  | default schema changed [dfs.root]  | +-------+---------------------------------------+ 1 row selected (0.079 seconds) select t.auctionid dfs.`/users/khahn/drill/apache-drill-1.1.0/part-00000` t; +------------+ | auctionid  | +------------+ | xx         | | 343        | +------------+ 2 rows selected (0.093 seconds) 

if files not have extension, modify storage plugin configuration set defaultinputformat "json"; otherwise, drill assumes file of parquet format default. example, set defaultinputformat in root workspace:

{   "type": "file",   "enabled": true,   "connection": "file:///",   "workspaces": {     "root": {       "location": "/",       "writable": false,       "defaultinputformat": "json"     },     "tmp": {       "location": "/tmp",       "writable": true,       "defaultinputformat": null     }   }, . . . 

when query files having no extension, need specify workspace in defined defaultinputformat, root in example:

select t.auctionid dfs.`root`.`/users/khahn/drill/apache-drill-1.1.0/part-00000` t; +------------+ | auctionid  | +------------+ | xx         | | 343        | +------------+ 2 rows selected (0.095 seconds) 

please see docs handling type differences: https://drill.apache.org/docs/json-data-model/#handling-type-differences


Comments

Popular posts from this blog

qt - Using float or double for own QML classes -

Create Outlook appointment via C# .Net -

ios - Swift Array Resetting Itself -