Categories
Data Science System

Tips: Reading Hive Tables from Spark

Collection of useful tips when working with Big Data tools including Hadoop, Hive, Spark

Tip#1. Solving Access Permission Conundrum

After creating a new database in Hive, you only need Hive ranger policy to allow reading the tables in the new database from Hive/Beeline/Beeline-Ranger.

But when reading the Hive table from Spark, it also needs a HDFS permission policy, in addition to the Hive ranger policy as above.

Tip#2. Running Queries on Hive Tables

For internal/managed ACID tables, use

 hive.executeQuery("SELECT * FROM DB.TABLE") 

For external non-ACID tables, use this below instead of over hive.executeQuery() to get 10x performance increase

spark.sql("SELECT * FROM DB.TABLE")

Tip#3. Check if a Table is Managed or External

Now if you are wondering if a table is managed or external, you can run this below in Hive/Beeline/Beeline-ranger which tells you if a table is external or managed table

DESCRIBE FORMATTED db.table_name;

It should show the information regarding the table. Check the Table Type value it should either say Table Type: MANAGED_TABLE or EXTERNAL_TABLE

Leave a Reply

Your email address will not be published. Required fields are marked *