Developing Apache Spark Applications
Also available as:
PDF

Permissions and ACL Enforcement

When user impersonation is enabled, permissions and ACL restrictions are applied on behalf of the submitting user.

In the following example, “foo_db” database has a table “drivers”, which only user “foo” can access:



A Beeline session running as user “foo” can access the data, read the drivers table, and create a new table based on the table:



Spark queries run in a YARN application as user “foo”:



All user permissions and access control lists are enforced while accessing tables, data or other resources. In addition, all output generated is for user “foo”.

For the table created in the preceding Beeline session, the owner is user “foo”:



The per-user Spark Application Master ("AM") caches data in memory without other users being able to access the data--cached data and state are restricted to the Spark AM running the query. Data and state information are not stored in the Spark Thrift server, so they are not visible to other users. Spark master runs as yarn-cluster, but query execution works as though it is yarn-client (essentially a yarn-cluster user program that accepts queries from STS indefinitely).