HDFS Administration
Also available as:
PDF
loading table of contents...

Overview of the HDFS Balancer

The HDFS Balancer is a tool for balancing the data across the storage devices of a HDFS cluster. The HDFS Balancer was originally designed to run slowly so that the balancing activities would not affect normal cluster activities and the running of jobs. As of HDP 2.3.4, the HDFS Balancer was redesigned.

With the redesign, the HDFS Balancer runs faster, though it can also be configured to run slowly. You can also specify the source datanodes, to free up the spaces in particular datanodes. You can use a block distribution application to pin its block replicas to particular datanodes so that the pinned replicas are not moved for cluster balancing.