Accessing Cloud Data
Also available as:
PDF
loading table of contents...

S3Guard: Known Issues

The following known issues have been identified while testing S3Guard.

Credentials in URLs are unsupported

S3Guard cannot be used when the AWS login credentials are in the S3 URL (HADOOP-15422)

Putting AWS credentials in the URLs such as s3a://AWSID:SECRETKEY/bucket/path() is very insecure, because the paths are widely logged: it is very hard to keep the secrets private. Losing the keys can be expensive and expose all information to which the account has access. S3Guard does not support this authentication mechanism. Place secrets in Hadoop configuration files, or (Better) JCECKs credential files.

Error when using trailing / in some hadoop fs commands

Some hadoop fs operations fail when there is a trailing / in the path, including the fs -mkdir command:

$ hadoop fs -mkdir -p s3a://guarded-table/dir/child/

  mkdir: get on s3a://guarded-table/dir/child/:
  com.amazonaws.services.dynamodbv2.model.AmazonDynamoDBException:
  One or more parameter values were invalid:
  An AttributeValue may not contain an empty string (Service: AmazonDynamoDBv2;
  Status Code: 400; Error Code: ValidationException

There is a straightforward workaround: remove the trailing / if a command fails.

$ hadoop fs -mkdir -p s3a://guarded-table/dir/child/

Fix: remove a trailing / if the fs -mkdir command fails.

The hadoop s3guard command output contains the error message “hadoop-aws.sh was not found”

This is a warning message about a file which is not found in HDP and which is not actually needed by the s3guard command. It is safe to ignore.

Failure handling of rename() operations

If a rename() operation fails partway through, including due to permissions, the S3Guard database is not reliably updated.

If this rename failed due to a network problem it's moot: if an application can't connect to S3, then DynamoDB will inevitably be unreachable; updates will be impossible. It can also surface if the bucket has been set up with complex permissions where not all callers have full write (including delete) access to the bucket. S3Guard, (and the S3A connector), prefers unrestricted write access to an entire R/W bucket.