HDFS Administration
Also available as:
PDF
loading table of contents...

Using the Balancer CLI Commands

Balancing Policy, Threshold, and Blockpools

[-policy <policy>]

Specifies which policy to use to determine if a cluster is balanced.

The two supported policies are blockpool and datanode. blockpool means that the cluster is balanced if each pool in each node is balanced. datanode means that a cluster is balanced if each datanode is balanced. blockpool is a more strict policy than datanode in the sense that the blockpool requirement implies the datanode requirement.

The default policy is datanode.

[-threshold <threshold>]

Specifies a number in [1.0, 100.0] representing the acceptable threshold of the percentage of storage capacity so that storage utilization outside the average +/- the threshold is considered as over/under utilized.

The default threshold is 10.0.

[-blockpools <comma-separated list of blockpool ids>]

Specifies a list of block pools on which the HDFS Balancer runs. If the list is empty, the HDFS Balancer runs on all existing block pools.

The default value is an empty list.

Include and Exclude Lists

[-include [-f <hosts-file> | <comma-separated list of hosts>]]

When the include list is non-empty, only the datanodes specified in the list are balanced by the HDFS Balancer. An empty include list means including all the datanodes in the cluster. The default value is an empty list.

[-exclude [-f <hosts-file> | <comma-separated list of hosts>]]

The datanodes specified in the exclude list are excluded so that the HDFS Balancer does not balance those datanodes. An empty exclude list means that no datanodes are excluded. When a datanode is specified in both in the include list and the exclude list, the datanode is excluded. The default value is an empty list.

Idle-Iterations and Run During Upgrade

[-idleiterations <idleiterations>]

Specifies the number of consecutive iterations in which no blocks have been moved before the HDFS Balancer terminates with the NO_MOVE_PROGRESS exit status.

Specify -1 for infinite iterations. The default is 5.

[-runDuringUpgrade]

If specified, the HDFS Balancer runs even if there is an ongoing HDFS upgrade. If not specified, the HDFS Balancer terminates with the UNFINALIZED_UPGRADE exit status.

When there is no ongoing upgrade, this option has no effect. It is usually not desirable to run HDFS Balancer during upgrade. To support rollback, blocks being deleted from HDFS are moved to the internal trash directory in datanodes and not actually deleted. Running the HDFS Balancer during upgrading cannot reduce the usage of any datanode storage.

Source Datanodes

[-source [-f <hosts-file> | <comma-separated list of hosts>]]

Specifies the source datanode list. The HDFS Balancer selects blocks to move from only the specified datanodes. When the list is empty, all the datanodes are chosen as a source. The option can be used to free up the space of some particular datanodes in the cluster. Without the -source option, the HDFS Balancer can be inefficient in some cases.

The default value is an empty list.

The following table shows an example, where the average utilization is 25% so that D2 is within the 10% threshold. It is unnecessary to move any blocks from or to D2. Without specifying the source nodes, HDFS Balancer first moves blocks from D2 to D3, D4 and D5, since they are under the same rack, and then moves blocks from D1 to D2, D3, D4 and D5. By specifying D1 as the source node, HDFS Balancer directly moves blocks from D1 to D3, D4 and D5.

Table 5.1. Example of Utilization Movement

Datanodes (with the same capacity)UtilizationRack
D195%A
D230%B
D3, D4, and D50%B