Hadoop simplified: January 2014

Wednesday, January 15, 2014

Day#3 Hive:Difference between managed and external table

Let us create 2 directories in HDFS by the name weather and citydata and copy our local files into it.Then we can point our Hive tables into this location so that we do not have to load the files in to it.

Browse the Hadoop file system . As my hadoop installtion is in stand alone
mode. My namenode also resides on localhost. By browsing localhost:50070 or 127.0.0.1:50070 and check the files .

Open a hive shell and create a table ( without external keyword , it can be considered as internal).The fields are separated by colon .By specifying the location here , we are directly pointing the table to HDFS location so that we do not need to load the file.

Select * from table to see the data .

Now let us add one file to the same location .It will append the rows into the table. We can also load additional files into same table.

Now the table should have more rows than what we had earlier. How to know ? lets do a count (*) .It will run a mapreduce job to get me the results. We have 80 records now.

Now let us drop the table.

Now all the data inside the location 'user/hive/data/weather' including the directory has been deleted.check for yourself . I am not lying !!

Now let us create an external table .

Select * from citydata . table is now loaded with 202 records.

Now go back and check the file in HDFS .

Drop the external table.

Now check again . the file is still there.You can check in the hdfs file system through web browser as well.

On Day#2 Flume blog , we had collected Weblogs from Apache Webserver into Hadoop.lets get the weblogs into Hive table

Tuesday, January 14, 2014

Day#3 Hive: Install Hive on Ubuntu and load a file into Hive

Hive installation is simple.I installed Hadoop in standalone mode on ubuntu linux machine.Download Hive into local drive from http://hive.apache.org/releases.html#Download.Find the latest releases.I used http://mirror.reverse.net/pub/apache/hive/hive-0.11.0/

Download hive from apache site into your local Ubuntu folder

Untar hive.

Untar

Move the binary into the hive home directory /usr/local/hive. Use sudo root password if you dont have permission.


Move the file into /usr/local/hive

Export variables and save it in .bashrc.


export variables and save in .bashrc

Here you go. In standalone mode, Hive shell will automatically detect the local HDFS and mapreduce nodes .Fire hive shell.


Start hive shell .Use set-v to see all configuration variables

Create a table book with a column string of data type string.

Create table syntax

From Another terminal window, Use hadoop fs -ls command to see the hive table is a directory in HDFS.

hive table is a directory in HDFS

Let us load a file into Hive.copy file book into HDFS using hadoop fs -copyFromLocal <local file system source> <HFDS Destination>

Copy the file named book to HDFS using hadoop fs -copyFromLocal command