hadoop - How to query data from AmazonS3 using Apache Drill? -
after using spark saveastextfile s3 format hadoop. format of file in bucket that.
format year/month/date/timestamp , data in part-0000 file json format.
i config drill , point bucket name
{ "type": "file", "enabled": true, "connection": "s3://com.giaosudau.win-bid", "workspaces": { "root": { "location": "/", "writable": false, "defaultinputformat": "json" }, "tmp": { "location": "/tmp", "writable": true, "defaultinputformat": null } }, "formats": { "psv": { "type": "text", "extensions": [ "tbl" ], "delimiter": "|" }, "csv": { "type": "text", "extensions": [ "csv" ], "delimiter": "," }, "tsv": { "type": "text", "extensions": [ "tsv" ], "delimiter": "\t" }, "parquet": { "type": "parquet" }, "json": { "type": "json" }, "avro": { "type": "avro" } } }
i select storage ok don't know select , data date or hour or file.
i try thing select file not success
file sample here
{"auctionid":"xx","bidrequeststring":{"id":"xx","timestamp":"2015-07-29t08:31:00.413z","istest":false,"url":"http://www.222.3232.com/","useragent":"mozilla/5.0 (x11; u; linux i686; en-us; rv:1.8.1.7pre) gecko/20070815 firefox/2.0.0.6 navigator/9.0b3","protocolversion":"google protocol buffer","exchange":"adx","provider":"google","location":{"dma":-1,"metro":-1,"timezoneoffsetminutes":-1},"segments":{"adxdetectedverticals":["23:0.571243","355:0.409098","380:0.339647","474:0.415936","540:0.079871"]},"userids":{"prov":"bgzqawdsa2dmbmrubwxtamdoags","xchg":"bgzqawdsa2dmbmrubwxtamdoags"},"imp":[{"id":"99","banner":{"w":[220,300],"h":[600,450],"id":"99","pos":0,"topframe":0},"pmp":{"ext":{"adgroup_id":"17490739393"}},"formats":["220x600","300x450"],"position":0},{"id":"199","banner":{"w":120,"h":600,"id":"199","pos":0,"topframe":0},"pmp":{"ext":{"adgroup_id":"17490739393"}},"formats":["120x600"],"position":0}],"spots":[{"id":"99","banner":{"w":[220,300],"h":[600,450],"id":"99","pos":0,"topframe":0},"pmp":{"ext":{"adgroup_id":"17490739393"}},"formats":["220x600","300x450"],"position":0},{"id":"199","banner":{"w":120,"h":600,"id":"199","pos":0,"topframe":0},"pmp":{"ext":{"adgroup_id":"17490739393"}},"formats":["120x600"],"position":0}],"site":{"id":"502","publisher":{"id":"502"},"page":"http://www.blog.34343.com/"},"device":{"ua":"mozilla/5.0 (x11; u; linux i686; en-us; rv:1.8.1.7pre) gecko/20070815 firefox/2.0.0.6 navigator/9.0b3","ip":"109.103.101.0","geo":{"zip":" 2600"},"language":"en","ext":{"geo_criteria_id":1000142}},"user":{"id":"bgzqawdsa2dmbmrubwxtamdoags","ext":{"cookie_age_seconds":2080089}}},"bidresponsecreative":{"itemid":"fsknxfwe34235235","campaignid":"332","htmlsnippet":"\u003ciframe frameborder=0 scrolling=no width=\"300\" height=\"250\" src=\"//sv.brand-display.com/adedge/api/bd/serving/simple/frame?aukey=34343\u0026_=%cachebuster%\u0026winning_price=%winning_price%\u0026google_click_url=%click_url_esc%\u0026encrypt_value=%encrypt_value%\"\u003e\u003c/iframe\u003e","name":"expandable web","formatcode":"billboard","bd":"","status":1,"deleted":false,"destinationurl":"https://ex.sg","tagging":[1,2,3],"expandingdirection":14,"bdurl":"","format":{"code":"billboard","name":"billboard","publisher":"default","type":"standard","width":120,"height":600,"expanded_width":0,"expanded_height":0,"collapsed_width":0,"collapsed_height":0,"aspratio":3.76,"expandable":true,"expand_first":true}},"bidresponsecreativeid":"1","bidresponsecreativename":"","bidresponsedata":{"bids":[{"creative":0,"ext":null,"price":"6071usd/1m","priority":1.0,"spotindex":0},{"creative":1,"ext":null,"price":"6071usd/1m","priority":1.0,"spotindex":1}]},"bidresponsemeta":"null","bidwinmeta":"null","biddingagentname":"iface.http","biddingfullaccount":"23848834:strategy","biddingmainaccount":"1212312","biddingmaxprice":"6071usd/1m","biddingrequestformattype":"datacratic","biddingsubaccount":"strategy","impindex":"1","impressionid":"199","pricepriority":"1.000000","rawwinprice":"100usd/1m","timestamp":"2015-jul-29 08:31:00.53899","userids":{"prov":"123232","xchg":"9849839"},"winprice":"100usd/1m"} {"auctionid":"343","bidrequeststring":{"id":"344","timestamp":"2015-07-28t08:31:00.413z","istest":false,"url":"http://www.xx.xx.com/","useragent":"mozilla/5.0 (x11; u; linux i686; en-us; rv:1.8.1.7pre) gecko/20070815 firefox/2.0.0.6 navigator/9.0b3","protocolversion":"google protocol buffer","exchange":"adx","provider":"google","location":{"dma":-1,"metro":-1,"timezoneoffsetminutes":-1},"segments":{"adxdetectedverticals":["23:0.571243","355:0.409098","380:0.339647","474:0.415936","540:0.079871"]},"userids":{"prov":"bgzqawdsa2dmbmrubwxtamdoags","xchg":"bgzqawdsa2dmbmrubwxtamdoags"},"imp":[{"id":"99","banner":{"w":[220,300],"h":[600,450],"id":"99","pos":0,"topframe":0},"pmp":{"ext":{"adgroup_id":"17490739393"}},"formats":["220x600","300x450"],"position":0},{"id":"199","banner":{"w":120,"h":600,"id":"199","pos":0,"topframe":0},"pmp":{"ext":{"adgroup_id":"17490739393"}},"formats":["120x600"],"position":0}],"spots":[{"id":"99","banner":{"w":[220,300],"h":[600,450],"id":"99","pos":0,"topframe":0},"pmp":{"ext":{"adgroup_id":"17490739393"}},"formats":["220x600","300x450"],"position":0},{"id":"199","banner":{"w":120,"h":600,"id":"199","pos":0,"topframe":0},"pmp":{"ext":{"adgroup_id":"17490739393"}},"formats":["120x600"],"position":0}],"site":{"id":"502","publisher":{"id":"502"},"page":"http://www.blog.343.com/"},"device":{"ua":"mozilla/5.0 (x11; u; linux i686; en-us; rv:1.8.1.7pre) gecko/20070815 firefox/2.0.0.6 navigator/9.0b3","ip":"109.103.101.0","geo":{"zip":" 2600"},"language":"en","ext":{"geo_criteria_id":1000142}},"user":{"id":"bgzqawdsa2dmbmrubwxtamdoags","ext":{"cookie_age_seconds":2080089}}},"bidresponsecreative":{"itemid":"fsknxfwe34235235","campaignid":"342342342","htmlsnippet":"\u003ciframe frameborder=0 scrolling=no width=\"300\" height=\"250\" src=\"//sv.brand-display.com/adedge/api/bd/serving/simple/frame?aukey=xxx\u0026_=%cachebuster%\u0026winning_price=%winning_price%\u0026google_click_url=%click_url_esc%\u0026encrypt_value=%encrypt_value%\"\u003e\u003c/iframe\u003e","name":"expandable web","formatcode":"billboard","bd":"","status":1,"deleted":false,"destinationurl":"https://ex.sg","tagging":[1,2,3],"expandingdirection":14,"bdurl":"","format":{"code":"billboard","name":"billboard","publisher":"default","type":"standard","width":120,"height":600,"expanded_width":0,"expanded_height":0,"collapsed_width":0,"collapsed_height":0,"aspratio":3.76,"expandable":true,"expand_first":true}},"bidresponsecreativeid":"1","bidresponsecreativename":"","bidresponsedata":{"bids":[{"creative":0,"ext":null,"price":"6071usd/1m","priority":1.0,"spotindex":0},{"creative":1,"ext":null,"price":"6071usd/1m","priority":1.0,"spotindex":1}]},"bidresponsemeta":"null","bidwinmeta":"null","biddingagentname":"iface.http","biddingfullaccount":"829838828928932:strategy","biddingmainaccount":"3434","biddingmaxprice":"6071usd/1m","biddingrequestformattype":"datacratic","biddingsubaccount":"strategy","impindex":"1","impressionid":"199","pricepriority":"1.000000","rawwinprice":"100usd/1m","timestamp":"2015-jul-29 08:31:00.53899","userids":{"prov":"cc","xchg":"cc"},"winprice":"100usd/1m"}
how data that? thanks.
here examples using data , local file system instead of s3. think files not have extension, if not true, if files have .json extension, can query them (use table alias t resolve ambiguity):
use dfs.`root`; +-------+---------------------------------------+ | ok | summary | +-------+---------------------------------------+ | true | default schema changed [dfs.root] | +-------+---------------------------------------+ 1 row selected (0.079 seconds) select t.auctionid dfs.`/users/khahn/drill/apache-drill-1.1.0/part-00000` t; +------------+ | auctionid | +------------+ | xx | | 343 | +------------+ 2 rows selected (0.093 seconds)
if files not have extension, modify storage plugin configuration set defaultinputformat "json"; otherwise, drill assumes file of parquet format default. example, set defaultinputformat in root workspace:
{ "type": "file", "enabled": true, "connection": "file:///", "workspaces": { "root": { "location": "/", "writable": false, "defaultinputformat": "json" }, "tmp": { "location": "/tmp", "writable": true, "defaultinputformat": null } }, . . .
when query files having no extension, need specify workspace in defined defaultinputformat, root in example:
select t.auctionid dfs.`root`.`/users/khahn/drill/apache-drill-1.1.0/part-00000` t; +------------+ | auctionid | +------------+ | xx | | 343 | +------------+ 2 rows selected (0.095 seconds)
please see docs handling type differences: https://drill.apache.org/docs/json-data-model/#handling-type-differences
Comments
Post a Comment