Spark Guide
Also available as:
PDF
loading table of contents...

Configuring the Spark Thrift Server on a Kerberos-Enabled Cluster

If you are installing the Spark Thrift Server on a Kerberos-secured cluster, note the following requirements:

  • The Spark Thrift Server must run in the same host as HiveServer2, so that it can access the hiveserver2 keytab.

  • Edit permissions in /var/run/spark and /var/log/spark to specify read/write permissions to the Hive service account.

  • Use the Hive service account to start the thriftserver process.

[Note]Note

We recommend that you run the Spark Thrift Server as user hive instead of user spark (this supersedes recommendations in previous releases). This ensures that the Spark Thrift Server can access Hive keytabs, the Hive metastore, and data in HDFS that is stored under user hive.

[Important]Important

When the Spark Thrift Server runs queries as user hive, all data accessible to user hive will be accessible to the user submitting the query. For a more secure configuration, use a different service account for the Spark Thrift Server. Provide appropriate access to the Hive keytabs and the Hive metastore.

For Spark jobs that are not submitted through the Thrift Server, the user submitting the job must have access to the Hive metastore in secure mode (via kinit).