https://www.cloudera.com/content/cloudera-content/cloudera-docs/HadoopTutorial/CDH4/Hadoop-Tutorial/ht_usage.html
1) Compile java code and make jar file using Hadoop classpath
$cd $HOME
$mkdir sourcecode
$cd sourcecode
$mkdir wordcount_classes #subdirectory under sourcecode
$vi WordCount.java
$javac -cp /usr/local/hadoop/hadoop-core-1.2.1.jar:/usr/local/hadoop/lib/commons-cli-1.2.jar -d wordcount_classes WordCount.java
$jar -cvf wordcount.jar -C wordcount_classes/ .
2) get the dataset from Internet
$wget http://www.gutenberg.org/files/4300/4300.txt
3) create HDFS input directory and transfer the file to HDFS
$hadoop fs -mkdir /user/butik/wordcount/input
$hadoop fs -put 4300.txt /user/butik/wordcount/input/
hdfs -fs -cat /user/butik/wordcount/input/4300.txt or
Check the file in hdfs namenode browser at localhost:50070
4) Run MapReduce program
$hadoop jar wordcount.jar org.myorg.WordCount /user/butik/wordcount/input/4300* /user/butik/wordcount/output
5) check map reduce output
$hadoop fs -cat /user/cloudera/wordcount/output/part-00000 or go to localhost:50070
6) Check Mapreduce jobs in Mapreduce browser localhost:50030
1) Compile java code and make jar file using Hadoop classpath
$cd $HOME
$mkdir sourcecode
$cd sourcecode
$mkdir wordcount_classes #subdirectory under sourcecode
$vi WordCount.java
$javac -cp /usr/local/hadoop/hadoop-core-1.2.1.jar:/usr/local/hadoop/lib/commons-cli-1.2.jar -d wordcount_classes WordCount.java
$jar -cvf wordcount.jar -C wordcount_classes/ .
2) get the dataset from Internet
$wget http://www.gutenberg.org/files/4300/4300.txt
3) create HDFS input directory and transfer the file to HDFS
$hadoop fs -mkdir /user/butik/wordcount/input
$hadoop fs -put 4300.txt /user/butik/wordcount/input/
hdfs -fs -cat /user/butik/wordcount/input/4300.txt or
Check the file in hdfs namenode browser at localhost:50070
4) Run MapReduce program
$hadoop jar wordcount.jar org.myorg.WordCount /user/butik/wordcount/input/4300* /user/butik/wordcount/output
5) check map reduce output
$hadoop fs -cat /user/cloudera/wordcount/output/part-00000 or go to localhost:50070
6) Check Mapreduce jobs in Mapreduce browser localhost:50030
No comments:
Post a Comment