Scaling Namespaces and Optimizing Data Storage
Parameters to configure the Disk Balancer

You must configure various parameters in hdfs-site.xml for effectively planning and executing the Disk Balancer.

Parameter Description
dfs.disk.balancer.enabled Controls whether the Disk Balancer is enabled for the cluster. The Disk Balancer executes only if this parameter is set to True. The default value is False.
dfs.disk.balancer.max.disk.throughputInMBperSec The maximum disk bandwidth that Disk Balancer consumes while transferring data between disks. The default value is 10 MB/s.
dfs.disk.balancer.max.disk.errors The maximum number of errors to ignore for a move operation between two disks before abandoning the move. The default value of the maximum errors to ignore is 5.

For example, if a plan specifies data move operation between three pairs of disks, and if the move between the first pair encounters more than five errors, that move is abandoned and the next move between the second pair of disks starts.

dfs.disk.balancer.block.tolerance.percent Specifies a threshold value in percentage to consider a move operation successful, and stop moving further data.

For example, setting this value to 20% for an operation requiring 10GB data movement indicates that the movement will be considered successful only after 8GB of data is moved.

dfs.disk.balancer.plan.threshold.percent The ideal storage value for a set of disks in a DataNode indicates the amount of data each disk should have for achieving perfect data distribution across those disks. The threshold percentage defines the value at which disks start participating in data redistribution or balancing operations. Minor imbalances are ignored because normal operations automatically correct some of these imbalances.

The default threshold percentage for a disk is 10%; indicating that a disk is used in balancing operations only if the disk contains 10% more or less data than the ideal storage value.