Accessing Cloud Data
Also available as:
PDF
loading table of contents...

Pruning Old Data from S3Guard Tables

S3Guard keeps tombstone markers of deleted files. It is good to clean these regularly, just to keep costs down. This can be done with the hadoop s3guard prune command. This can be used to delete entries older than a certain number of days, minutes or hours:

hadoop s3guard prune -days 3 -hours 6 -minutes 15 s3a://guarded-table/

2018-05-31 15:39:27,981 [main] INFO s3guard.S3GuardTool (S3GuardTool.java:initMetadataStore(270)) -
    Metadata store DynamoDBMetadataStore{region=eu-west-1, tableName=guarded-table} is initialized.
2018-05-31 15:39:33,770 [main] INFO s3guard.DynamoDBMetadataStore (DynamoDBMetadataStore.java:prune(851)) -
    Finished pruning 366 items in batches of 25