Wednesday, January 15, 2014

Day#3 Hive:Difference between managed and external table



Let us create 2 directories in HDFS by the name weather and citydata and copy our local files into it.Then we can point our Hive tables into this location so that we do not have to load the files in to it.




















 Browse the Hadoop file system . As my hadoop installtion is in stand alone
mode. My namenode also resides on localhost. By browsing localhost:50070 or 127.0.0.1:50070 and check the files .





















Open a hive shell and create a table ( without external keyword , it can be considered as internal).The fields are separated by colon .By specifying the location here , we are directly pointing the table to HDFS location so that we do not need to load the file. 






Select * from table to see the data . 
   





Now let us add one file to the same location .It will append the rows into the table. We can also load additional files into same table.
















Now the table should have more rows than what we had earlier. How to know ? lets do a count (*) .It will run a mapreduce job to get me the results. We have 80 records now.

































Now let us drop the table.  











Now all the data inside the location 'user/hive/data/weather' including the directory has been deleted.check for yourself . I am not lying !!


















Now let us create an external table .


 










Select * from citydata . table is now loaded with 202 records.





















 

Now go back and check the file in HDFS .


 








Drop the external table.


 









Now check again . the file is still there.You can check in the hdfs file system through web browser as well.









On Day#2 Flume blog , we had collected Weblogs from Apache Webserver into Hadoop.lets get the weblogs into Hive table


No comments:

Post a Comment