Scaling Namespaces and Optimizing Data Storage
Also available as:
PDF
loading table of contents...

HDFS storage policies

You can store data on DISK or ARCHIVE storage types using preconfigured storage policies.

The following preconfigured storage policies are available:

  • HOT: Used for both storage and compute. Data that is being used for processing will stay in this policy. When a block is HOT, all replicas are stored on DISK. There is no fallback storage for creation, and ARCHIVE is used for replication fallback storage.

  • WARM: Partially HOT and partially COLD. When a block is WARM, the first replica is stored on DISK, and the remaining replicas are stored on ARCHIVE. The fallback storage for both creation and replication is DISK, or ARCHIVE if DISK is unavailable.

  • COLD: Used only for storage, with limited compute. Data that is no longer being used, or data that needs to be archived, is moved from HOT storage to COLD storage. When a block is COLD, all replicas are stored on ARCHIVE, and there is no fallback storage for creation or replication.

The following table summarizes these replication policies:

Policy ID

Policy Name

Replica Block Placement (for n replicas)

Fallback storage for creation

Fallback storage for replication

12 HOT (default) Disk: n <none> ARCHIVE
8 WARM Disk: 1, ARCHIVE: n-1 DISK, ARCHIVE DISK, ARCHIVE
4 COLD ARCHIVE: n <none> <none>