Scaling Namespaces and Optimizing Data Storage
Erasure coding policies

To accommodate heterogeneous workloads, files and directories in an HDFS cluster are allowed to have different replication and EC policies.

Each policy is defined by the following 2 pieces of information:

  • The EC Schema: Includes the numbers of data and parity blocks in an EC group (e.g., 6+3), as well as the codec algorithm (for example, Reed-Solomon).
  • The size of a striping cell: Determines the granularity of striped reads and writes, including buffer sizes and encoding work.

HDP supports the Reed-Solomon Erasure Coding algorithm. The system default scheme is Reed-Solomon with 6 data blocks, 3 parity blocks, and a 1024 KB cell size (RS-6-3-1024k).

In addition, the following policies are supported: RS-3-2-1024k (Reed-Solomon with 3 data blocks, 2 parity blocks and 1024 KB cell size), RS-LEGACY-6-3-1024k, and XOR-2-1-1024k.