Accessing Cloud Data
Also available as:
loading table of contents...

Securing the S3A Committers

Here are a few security considerations when using the S3A committers.

S3 Bucket Permissions

To use an S3A committer, the account/role must have the same AWS permissions as needed to write to the destination bucket.

The multipart upload list/abort operations may be a new addition to the permissions for the active role.

When using S3Guard, the account/role must also have full read/write/delete permissions for the DynamoDB table used.

Local Filesystem Security

All the committers use the directories listed in fs.s3a.buffer.dir to store staged/buffered output before uploading it to S3.

To ensure that other processes running on the same host do not have access to this data, unique paths should be used per-account.

This requirement holds for all applications working with S3 through the S3A connector.

HDFS Security

The directory and partitioned committers use HDFS to propagate commit information between workers and the job committer.

These intermediate files are saved into a job-specific directory under the path ${fs.s3a.committer.staging.tmp.path}/${user} where ${user} is the name of the user running the job.

The default value of fs.s3a.committer.staging.tmp.path is tmp/staging, Which will be converted at run time to a path under the current user's home directory, essentially /user/${user}/tmp/staging/${user}/.

Provided the user's home directory has access restricted to trusted accounts, this intermediate data will be secure.

If the property fs.s3a.committer.staging.tmp.path is changed to a different location, then this path will need to be secured to protect pending jobs from tampering. Note: this intermediate data does not contain the output, merely the listings of all pending files and the MD5 checksums of each block.