Migrating data
Also available as:
PDF

HiveWarehouseConnector for accessing Apache Spark data

HiveWarehouseConnector is a library you use to read and write Apache Spark DataFrames and Streaming DataFrames to and from Apache Hive using low-latency, analytical processing (LLAP).

Apache Ranger and the HiveWarehouseConnector library provide row and column, fine-grained access to Spark data in Hive.

HiveWarehouseConnector supports the following applications:
  • Spark shell
  • PySpark
  • The spark-submit script
The following list describes a few of the operations supported by HiveWarehouseConnector:
  • Describing a table
  • Creating a table for ORC-formatted data
  • Selecting Hive data and retrieving a DataFrame
  • Writing a DataFram to Hive in batch
  • Executing a Hive update statment
  • Reading table data from Hive, transforming it in Spark, and writing it to a new Hive table
  • Writing a DataFrame or Spark stream to Hive using HiveStreaming