Benefits of erasure coding
HDFS supports Erasure Coding (EC) with data striping at the directory level.
In the context of EC, striping has critical advantages.
Striping enables online EC (writing data immediately in EC format). Clients can directly write erasure-coded data as it requires only a small amount of buffering to calculate parity data. Online EC also enhances sequential I/O performance by leveraging multiple disk spindles in parallel; this is especially desirable in clusters with high-end networking.
In addition, EC with striping naturally distributes a small file to multiple DataNodes and eliminates the need to bundle multiple files into a single coding group.
In typical HDFS clusters, small files can account for over 3/4 of total storage consumption. To better support small files, HDFS supports EC with striping.