How to get Mapreduce output in a single file instead of multiple files in Hadoop Cluster on Google Cloud? -


when running jar on local hadoop multi-node cluster, can see output reducer output , single file every job.

when run same jar on google cloud, multiple output files (part-r-0000*). instead need output written single file. how do that?

well 1 simple solution configure job run 1 reducer. seems on google cloud default setting different. see here how that: setting number of reducers in mapreduce job in oozie workflow

another way deal have concatenating script run @ end of map reduce job pieces part-r files, ie

cat *part-r* >>alloutput 

may bit more complex if have headers , need copy local first.


Comments

Popular posts from this blog

matlab - "Contour not rendered for non-finite ZData" -

delphi - Indy UDP Read Contents of Adata -

javascript - Any ideas when Firefox is likely to implement lengthAdjust and textLength? -