dataproc notes

dataproc notes

copy a GCP bucket contents to GCP intance.

 gcloud storage cp -r gs://bucket_name/* 

command to run the mapper and reducer in hadoop :

hadoop jar /usr/lib/hadoop/hadoop-streaming.jar -D mapreduce.job.reduces=1 -file *****.py -mapper "python ****.py" -file *****.py -reducer "python *****.py" -input ********* -output **********
-files
gs://bucket/map1.py,gs://databucketwasim/reducer1.py
-input
gs://bucket/words200.txt
-output
gs://bucket/output_wordcount4
-mapper
python3 map1.py
-reducer
python3 reducer1.py