Is it possible for Parquet to compress the summary file (

Is it possible for Parquet to compress the summary file (_metadata) in the MR job? -

- September 15, 2010

right using mapreduce job convert data , store result in parquet format.

there summary file (_metadata) generated well. problem is big (over 5g). there way reduce size?

credits alex levenson , ryan blue:

alex levenson:

you can push reading of summary file mappers instead of reading on submitter node:

parquetinputformat.settasksidemetadata(conf, true);

(ryan blue: default 1.6.0 forward)

or setting "parquet.task.side.metadata" true in configuration. had similar issue, default client reads summary file on submitter node takes lot of time , memory. flag fixes issue instead reading each individual file's metadata file footer in mappers (each mapper reads metadata needs).

another option, we've been talking in past, disable creating metadata file @ all, we've seen creating can expensive too, , if use task side metadata approach, it's never used.

(ryan blue: there's option suppress files, recommend. file metadata handled on tasks, there's not need summary files.)

Search This Blog

Chrom

Is it possible for Parquet to compress the summary file (_metadata) in the MR job? -

Comments

Post a Comment

Popular posts from this blog

qt - Using float or double for own QML classes -

json - ORA-06502: PL/SQL: numeric or value error: character string buffer too small - Convert Clob to varchar2 -

python - jinja2: TemplateSyntaxError: expected token ',', got 'string' -