Apache Spark access to Apache Hive

From Apache Spark, you access ACID v2 tables and external tables in Apache Hive 3 using the Hive Warehouse Connector.

The HiveWarehouseConnector library is a Spark library built on top of Apache Arrow for accessing Hive ACID and external tables for reading and writing from Spark.

The Hive Warehouse Connector is optimized for fast transmission of data from low-latency analytical processing (LLAP) to Spark and designed to leverage the LLAP cache. The connector orchestrates a distributed read from LLAP daemons. The read from cache occurs after applying security rules and ACID transformations.

You need low-latency analytical processing (LLAP) to read ACID, or other Hive-managed tables, from Spark. You do not need LLAP to write to ACID, or other managed tables, from Spark. You do not need LLAP to access external tables from Spark. The HWC library internally uses the Hive Streaming API and LOAD DATA Hive commands to write the data.