Developing Apache Spark Applications
Also available as:
PDF

Access Spark SQL through JDBC or ODBC: prerequisites

This section describes prerequisites for accessing Spark SQL through JDBC or ODBC.

Using the Spark Thrift server, you can remotely access Spark SQL over JDBC (using the JDBC Beeline client) or ODBC (using the Simba driver).

The following prerequisites must be met before accessing Spark SQL through JDBC or ODBC:

  • The Spark Thrift server must be deployed on the cluster. See "Installing and Configuring Spark Over Ambari" in this guide for more information).

  • Ensure that SPARK_HOME is defined as your Spark directory:

    export SPARK_HOME=/usr/hdp/current/spark-client

If you want to enable user impersonation for the Spark Thrift server, so that the Thrift server runs Spark SQL jobs as the submitting user, see "Configuring the Spark Thrift server" in this guide.

Before accessing Spark SQL through JDBC or ODBC, note the following caveats:

  • The Spark Thrift server works in YARN client mode only.

  • ODBC and JDBC client configurations must match Spark Thrift server configuration parameters. For example, if the Thrift server is configured to listen in binary mode, the client should send binary requests and use HTTP mode when the Thrift server is configured over HTTP.

  • All client requests coming to the Spark Thrift server share a SparkContext.

Additional Spark Thrift Server Commands

To list available Thrift server options, run ./sbin/start-thriftserver.sh --help.

To manually stop the Spark Thrift server, run the following commands:

su spark
./sbin/stop-thriftserver.sh