Wednesday, April 23, 2014

Day#3 MapReduce :WordCount Example

https://www.cloudera.com/content/cloudera-content/cloudera-docs/HadoopTutorial/CDH4/Hadoop-Tutorial/ht_usage.html

1) Compile java code and make jar file using Hadoop classpath


$cd $HOME
$mkdir sourcecode
$cd sourcecode
$mkdir wordcount_classes  #subdirectory under sourcecode
$vi WordCount.java
$javac -cp /usr/local/hadoop/hadoop-core-1.2.1.jar:/usr/local/hadoop/lib/commons-cli-1.2.jar -d wordcount_classes WordCount.java
$jar -cvf wordcount.jar -C wordcount_classes/ . 



















    

2) get the dataset from Internet
 
$wget http://www.gutenberg.org/files/4300/4300.txt


 

3) create HDFS input directory and transfer the file to HDFS

$hadoop fs -mkdir /user/butik/wordcount/input
$hadoop fs -put 4300.txt /user/butik/wordcount/input/  


hdfs -fs -cat /user/butik/wordcount/input/4300.txt or
Check the file in hdfs namenode browser at localhost:50070

























4) Run MapReduce program

$hadoop jar wordcount.jar org.myorg.WordCount /user/butik/wordcount/input/4300*  /user/butik/wordcount/output














5) check map reduce output
$hadoop fs -cat /user/cloudera/wordcount/output/part-00000 or go to localhost:50070












































6) Check Mapreduce jobs in Mapreduce browser  localhost:50030





No comments:

Post a Comment