How to get Mapreduce output in a single file instead of multiple files in Hadoop Cluster on Google Cloud? -
when running jar on local hadoop multi-node cluster, can see output reducer output , single file every job.
when run same jar on google cloud, multiple output files (part-r-0000*). instead need output written single file. how do that?
well 1 simple solution configure job run 1 reducer. seems on google cloud default setting different. see here how that: setting number of reducers in mapreduce job in oozie workflow
another way deal have concatenating script run @ end of map reduce job pieces part-r files, ie
cat *part-r* >>alloutput
may bit more complex if have headers , need copy local first.
Comments
Post a Comment