Developing Apache Spark Applications
Also available as:
PDF

Spark Thrift Server as Proxy

The Spark Thrift server does not run user queries; it forwards them to the appropriate user-specific Spark AM. This improves the scalability and fault tolerance of the Spark Thrift server.



When user impersonation is enabled for the Spark Thrift server, the Thrift Server is responsible for the following features and capabilities:

  • Authorizing incoming user connections (SASL authorization that validates the user Beeline/socket connection).

  • Managing Spark applications launched on behalf of users:

    • Launching Spark application if no appropriate application exists for the incoming request.

    • Terminating the Spark AM when all associated user connections are closed at the Spark Thrift server.

  • Acting as a proxy and forwarding requests/responses to the appropriate user’s Spark AM.

  • Ensuring that long-running Spark SQL sessions persist, by keeping the Kerberos state valid.

    • The Spark Thrift server and Spark AM, when launched on behalf of a user, can be long-running applications in clusters with Kerberos enabled.

    • The submitter's principal and keytab are not required for long-running Spark AM processes, although the Spark Thrift server requires the Hive principal and keytab.