Accessing Cloud Data
Also available as:
PDF
loading table of contents...

Errors Related to Visible S3 Inconsistency

Amazon S3 is an eventually consistent object store.

It may take time for a newly created object to appear in directory listings, while deleted objects may still be visible.

After an object is overwitten, attempting to read the new data may still return the old data.

After an object is deleted, it may still be visible in directory, listings; attempting to open the file may return the deleted data.

The directory inconsistency is precisely the problem which S3Guard aims to correct.

Inconsistent directory listings can surface as a `FileNotFoundException` during a file or directory rename, including when the output of a Hive, MapReduce or Spark job is renamed into its final destination.

  • Enable S3Guard on any bucket used for the output of Hive, Spark or MR jobs.

  • Use an S3A committer to safely and efficiently commit the output of MR and Spark jobs.

  • When writing new data into an existing location, give the new files different names.