Advanced Connection Management Features
By default, all connections for a user are forwarded to the same Spark AM to execute the query. In some cases, it is necessary to exercise finer-grained control.
Specifying Named Connections
When user impersonation is enabled, Spark supports user-named connections identified
by a user-specified
connectionId (a Hive
parameter in the connection URL). This can be useful when overriding Spark
configurations such as queue, memory configuration, or executor configuration
Every Spark AM managed by the Spark Thrift server is associated with a user and
connectionId. Connection IDs are not globally unique; they are
specific to the user.
You can specify
connectionId to control which Spark AM executes
queries. If you not specify
connectionId, a default connectionId is
associated with the Spark AM.
To explicitly name a connection, set the Hive
conf parameter to
spark.sql.thriftServer.connectionId, as shown in the following
Note: Named connections allow users to specify their own Spark AM connections. They are scoped to individual users, and do not allow a user to access the Spark AM associated with another user.
If the Spark AM is available, the connection is associated with the existing Spark AM.
Data Sharing and Named Connections
connectionId for a user identifies a different Spark AM.
For a user, cached data is shared and available only within a single AM, not across Spark AM’s.
Different user connections on the same Spark AM can leverage previously cached data. Each user connection has its own Hive session (which maintains the current database, Hive variables, and so on), but shares the underlying cached data, executors, and Spark application.
The following example shows a session for the first connection from user “foo” to named connection “conn1”:
After caching the ‘drivers’ table, the query runs an order of magnitude faster.
A second connection to the same
connectionId from user “foo”
leverages the cached table from the other active Beeline session, significantly
increasing query execution speed:
Overriding Spark Configuration Settings
If the Spark Thrift server is unable to find an existing Spark AM for a user
connection, by default the Thrift server launches a new Spark AM to service user
queries. This is applicable to named connections and unnamed connections. When a new
Spark AM is to be launched, you can override current Spark configuration settings by
specifying them in the connection URL. Specify Spark configuration settings as
hiveconf variables prepended by the
The following connection URL includes a
setting of 4 GB:
The environment tab of the Spark application shows the appropriate value: