Command Line Installation
Also available as:
PDF

Configuring NameNode Heap Size

NameNode heap size depends on many factors, such as the number of files, the number of blocks, and the load on the system. The following table provides recommendations for NameNode heap size configuration. These settings should work for typical Hadoop clusters in which the number of blocks is very close to the number of files (generally, the average ratio of number of blocks per file in a system is 1.1 to 1.2).

Some clusters might require further tweaking of the following settings. Also, it is generally better to set the total Java heap to a higher value.

Table 1.11. Recommended NameNode Heap Size Settings

Number of Files , in Millions

Total Java Heap (Xmx and Xms)

Young Generation Size (-XX:NewSize -XX:MaxNewSize)

< 1 million files

1126m

128m

1-5 million files

3379m

512m

5-10

5913m

768m

10-20

10982m

1280m

20-30     

16332m

2048m

30-40

21401m

2560m

40-50

26752m

3072m

50-70

36889m

4352m

70-100

52659m

6144m

100-125

65612m

7680m

125-150

78566m

8960m

150-200

104473m

8960m


[Note]Note

Hortonworks recommends a maximum of 300 million files on the NameNode.

You should also set -XX:PermSize to 128m and -XX:MaxPermSize to 256m.

Following are the recommended settings for HADOOP_NAMENODE_OPTS in the hadoop-env.sh file (replacing the ##### placeholder for -XX:NewSize, -XX:MaxNewSize, -Xms, and -Xmx with the recommended values from the table):

-server -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/hadoop/$USER/hs_err_pid%p.log -XX:NewSize=##### -XX:MaxNewSize=##### -Xms##### -Xmx##### -XX:PermSize=128m -XX:MaxPermSize=256m -Xloggc:/var/log/hadoop/$USER/gc.log-`date +'%Y%m%d%H%M'` -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Dhadoop.security.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT ${HADOOP_NAMENODE_OPTS}

If the cluster uses a secondary NameNode, you should also set HADOOP_SECONDARYNAMENODE_OPTS to HADOOP_NAMENODE_OPTS in the hadoop-env.sh file:

HADOOP_SECONDARYNAMENODE_OPTS=$HADOOP_NAMENODE_OPTS

Another useful HADOOP_NAMENODE_OPTS setting is -XX:+HeapDumpOnOutOfMemoryError. This option specifies that a heap dump should be executed when an out-of-memory error occurs. You should also use -XX:HeapDumpPath to specify the location for the heap dump file:

-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=./etc/heapdump.hprof