Integrating Apache Hive with Spark and BI
Also available as:
PDF

Hive Warehouse Connector for accessing Apache Spark data

Using the Hive Warehouse Connector, you can read and write Apache Spark DataFrames and Streaming DataFrames to and from Apache Hive using low-latency, analytical processing (LLAP).

Apache Ranger and the HiveWarehouseConnector library provide row and column, fine-grained access to Spark data in Hive.

The Hive Warehouse Connector supports the following applications:
  • Spark shell
  • PySpark
  • The spark-submit script
The following list describes a few of the operations supported by the Hive Warehouse Connector:
  • Describing a table
  • Creating a table for ORC-formatted data
  • Selecting Hive data and retrieving a DataFrame
  • Writing a DataFrame to Hive in batch
  • Executing a Hive update statement
  • Reading table data from Hive, transforming it in Spark, and writing it to a new Hive table
  • Writing a DataFrame or Spark stream to Hive using HiveStreaming