Hadoop simplified: Day#3 Hive:Difference between managed and external table

Let us create 2 directories in HDFS by the name weather and citydata and copy our local files into it.Then we can point our Hive tables into this location so that we do not have to load the files in to it.

Browse the Hadoop file system . As my hadoop installtion is in stand alone
mode. My namenode also resides on localhost. By browsing localhost:50070 or 127.0.0.1:50070 and check the files .

Open a hive shell and create a table ( without external keyword , it can be considered as internal).The fields are separated by colon .By specifying the location here , we are directly pointing the table to HDFS location so that we do not need to load the file.

Select * from table to see the data .

Now let us add one file to the same location .It will append the rows into the table. We can also load additional files into same table.

Now the table should have more rows than what we had earlier. How to know ? lets do a count (*) .It will run a mapreduce job to get me the results. We have 80 records now.