Integrating Apache Hive with Spark and BI
Also available as:
PDF

Apache Spark-Apache Hive connection configuration

You can configure Spark properties in Ambari to use the Hive Warehouse Connector for accessing data in Hive.

Prerequisites

You need to use the following software to connect Spark and Hive using the HiveWarehouseConnector library.

  • HDP 3.0
  • Hive with HiveServer Interactive
  • Spark2

Required properties

You must add several Spark properties through spark-2-defaults in Ambari to use the Hive Warehouse Connector for accessing data in Hive. Alternatively, configuration can be provided for each job using --conf.
Property Description Comments
spark.sql.hive.hiveserver2.jdbc.url URL for HiveServer2 Interactive In Ambari, copy the value from Hive Summary > HIVESERVER2 INTERACTIVE JDBC URL.
spark.datasource.hive.warehouse.metastoreUri URI for metastore Copy the value from hive.metastore.uris.
spark.datasource.hive.warehouse.load.staging.dir HDFS temp directory for batch writes to Hive For example, /tmp
spark.hadoop.hive.llap.daemon.service.hosts Application name for LLAP service Copy value from Advanced hive-interactive-site > hive.llap.daemon.service.hosts
spark.hadoop.hive.zookeeper.quorum Zookeeper hosts used by LLAP Copy value from Advanced hive-sitehive.zookeeper.quorum

Spark on a Kerberized YARN cluster

In Spark client mode on a kerberized Yarn cluster, set the following property:
  • Property: spark.sql.hive.hiveserver2.jdbc.url.principal
  • Description: Must be equal to hive.server2.authentication.kerberos.principal.
  • Comment: Copy from Advanced hive-site hive.server2.authentication.kerberos.principal.