Developing Apache Spark Applications
Also available as:
PDF

Access Spark SQL through JDBC

Use the following steps to access Spark SQL through JDBC.

To access Spark SQL through JDBC, you need a JDBC URL connection string to supply connection information to the JDBC data source. Connection strings for the Spark SQL JDBC driver have the following format:

jdbc:hive2://<host>:<port>/<dbName>;<sessionConfs>?<hiveConfs>#<hiveVars>

JDBC Parameter Description
host The node hosting the Thrift server
port The port number on which the Thrift server listens
dbName The name of the Hive database to run the query against
sessionConfs Optional configuration parameters for the JDBC or ODBC driver in the following format: <key1>=<value1>;<key2>=<key2>...;
hiveConfs

Optional configuration parameters for Hive on the server in the following format: <key1>=<value1>;<key2>=<key2>; ...

These settings last for the duration of the user session.

hiveVars

Optional configuration parameters for Hive variables in the following format: <key1>=<value1>;<key2>=<key2>; ...

These settings persist for the duration of the user session.

Note
Note

The Spark Thrift server is a variant of HiveServer2, so you can use many of the same settings. For more information about JDBC connection strings, including transport and security settings, see "Hive JDBC and ODBC Drivers" in the HDP Data Access guide.

The following connection string accesses Spark SQL through JDBC on a Kerberos-enabled cluster:

beeline> !connect jdbc:hive2://localhost:10002/default;httpPath=/;principal=hive/hdp-team.example.com@EXAMPLE.COM

The following connection string accesses Spark SQL through JDBC over HTTP transport on a Kerberos-enabled cluster:

beeline> !connect jdbc:hive2://localhost:10002/default;transportMode=http;httpPath=/;principal=hive/hdp-team.example.com@EXAMPLE.COM

To access Spark SQL, complete the following steps:

  1. Connect to the Thrift server over the Beeline JDBC client.
    1. From the SPARK_HOME directory, launch Beeline:
      su spark
      ./bin/beeline
    2. At the Beeline prompt, connect to the Spark SQL Thrift server with the JDBC connection string:
      beeline> !connect jdbc:hive2://localhost:10015

      The host port must match the host port on which the Spark Thrift server is running.

      You should see output similar to the following:

      beeline> !connect jdbc:hive2://localhost:10015
      Connecting to jdbc:hive2://localhost:10015
      Enter username for jdbc:hive2://localhost:10015:
      Enter password for jdbc:hive2://localhost:10015:
      ...
      Connected to: Spark SQL (version 2.0.0)
      Driver: Spark Project Core (version 2.0.0.2.4.0.0-169)
      Transaction isolation: TRANSACTION_REPEATABLE_READ
      0: jdbc:hive2://localhost:10015>
  2. When connected, issue a Spark SQL statement.

    The following example executes a SHOW TABLES query:

    0: jdbc:hive2://localhost:10015> show tables;
    +------------+--------------+--+
    | tableName  | isTemporary  |
    +------------+--------------+--+
    | sample_07  | false        |
    | sample_08  | false        |
    | testtable  | false        |
    +------------+--------------+--+
    3 rows selected (2.399 seconds)
    0: jdbc:hive2://localhost:10015>