Apache Spark Component Guide
Also available as:
PDF
loading table of contents...

Accessing Spark SQL through JDBC or ODBC: Prerequisites

Using the Spark Thrift server, you can remotely access Spark SQL over JDBC (using the JDBC Beeline client) or ODBC (using the Simba driver).

The following prerequisites must be met before accessing Spark SQL through JDBC or ODBC:

  • The Spark Thrift server must be deployed on the cluster.

  • Ensure that SPARK_HOME is defined as your Spark directory:

    export SPARK_HOME=/usr/hdp/current/spark-client

If you want to enable user impersonation for the Spark Thrift server, so that the Thrift server runs Spark SQL jobs as the submitting user, see Configuring the Spark Thrift server.

Before accessing Spark SQL through JDBC or ODBC, note the following caveats:

  • The Spark Thrift server works in YARN client mode only.

  • ODBC and JDBC client configurations must match Spark Thrift server configuration parameters. For example, if the Thrift server is configured to listen in binary mode, the client should send binary requests and use HTTP mode when the Thrift server is configured over HTTP.

  • All client requests coming to the Spark Thrift server share a SparkContext.

Additional Spark Thrift Server Commands

To list available Thrift server options, run ./sbin/start-thriftserver.sh --help.

To manually stop the Spark Thrift server, run the following commands:

su spark
./sbin/stop-thriftserver.sh