Configuring NameNode Heap Size - Hortonworks Data Platform

Command Line Installation

Also available as:

PDF

Contents

1. Preparing to Manually Install HDP
- Meeting Minimum System Requirements
- Virtualization and Cloud Platforms
- Configuring Remote Repositories
- Deciding on a Deployment Type
- Collect Information
- Prepare the Environment
- Download Companion Files
- Define Environment Parameters
- Creating System Users and Groups
- Determining HDP Memory Configuration Settings
  - Running the YARN Utility Script
  - Calculating YARN and MapReduce Memory Requirements
- Configuring NameNode Heap Size
- Allocating Adequate Log Space for HDP
- Downloading the HDP Maven Artifacts
2. Installing Apache ZooKeeper
- Install the ZooKeeper Package
- Securing ZooKeeper with Kerberos (optional)
- Securing ZooKeeper Access
- Set Directories and Permissions
- Set Up the Configuration Files
- Start ZooKeeper
3. Installing HDFS, YARN, and MapReduce
- Set Default File and Directory Permissions
- Install the Hadoop Packages
- Install Compression Libraries
  - Install Snappy
  - Install LZO
- Create Directories
4. Setting Up the Hadoop Configuration
5. Validating the Core Hadoop Installation
- Format and Start HDFS
- Smoke Test HDFS
- Configure YARN and MapReduce
- Start YARN
- Start MapReduce JobHistory Server
- Smoke Test MapReduce
6. Deploying HDP In Production Data Centers With Firewalls
- Terminology
- Mirroring or Proxying
- Considerations for choosing a Mirror or Proxy solution
- Recommendations for Deploying HDP
- Detailed Instructions for Creating Mirrors and Proxies
  - Option I - Mirror server has no access to the Internet
  - Option II - Mirror server has temporary or continuous access to the Internet
- Set up a trusted proxy server
7. Installing Apache HBase
- Install the HBase Package
- Set Directories and Permissions
- Set Up the Configuration Files
- Add Configuration Parameters for Bulk Load Support
- Validate the Installation
- Starting the HBase Thrift and REST Servers
8. Installing Apache Phoenix
- Installing the Phoenix Package
- Configuring HBase for Phoenix
- Configuring Phoenix to Run in a Secure Cluster
- Validating the Phoenix Installation
- Troubleshooting Phoenix
9. Installing and Configuring Apache Tez
- Prerequisites
- Installing the Tez Package
- Configuring Tez
- Setting Up Tez for the Tez UI
- Creating a New Tez View Instance
- Validating the Tez Installation
- Troubleshooting
10. Installing Apache Hive and Apache HCatalog
- Installing the Hive-HCatalog Package
- Setting Up the Hive/HCatalog Configuration Files
  - HDP-Utility script
  - Configure Hive and HiveServer2 for Tez
- Setting Up the Database for the Hive Metastore
- Setting up RDBMS for use with Hive Metastore
- Enabling Tez for Hive Queries
- Disabling Tez for Hive Queries
- Configuring Tez with the Capacity Scheduler
- Validating Hive-on-Tez Installation
- Installing Apache Hive LLAP
- LLAP Prerequisites
- Preparing to Install LLAP
- Installing LLAP on an Unsecured Cluster
- Installing LLAP on a Secured Cluster
- Stopping the LLAP Service
- Tuning LLAP for Performance
11. Installing Apache Pig
- Install the Pig Package
- Validate the Installation
12. Installing Apache WebHCat
- Install the WebHCat Package
- Upload the Pig, Hive and Sqoop tarballs to HDFS
- Set Directories and Permissions
- Modify WebHCat Configuration Files
- Set Up HDFS User and Prepare WebHCat Directories
- Validate the Installation
13. Installing Apache Oozie
- Install the Oozie Package
- Set Directories and Permissions
- Set Up the Oozie Configuration Files
- Configure Your Database for Oozie
- Set up the Sharelib
- Validate the Installation
- Stop and Start Oozie
14. Installing Apache Ranger
- Installation Prerequisites
- Installing Policy Manager
- Installing UserSync
  - Using the LDAP Connection Check Tool
  - Install UserSync and Start the Service
- Installing Ranger Plug-ins
- Installing Ranger in a Kerberized Environment
- Verifying the Installation
15. Installing Hue
- Before You Begin
- Configure HDP to Support Hue
- Install the Hue Packages
- Configure Hue to Communicate with the Hadoop Components
  - Configure the Web Server
  - Configure Hadoop
- Configure Hue for Databases
- Start, Stop, and Restart Hue
- Validate the Hue Installation
16. Installing Apache Sqoop
- Install the Sqoop Package
- Set Up the Sqoop Configuration
- Validate the Sqoop Installation
17. Installing Apache Mahout
- Install Mahout
- Validate Mahout
18. Installing and Configuring Apache Flume
- Installing Flume
- Configuring Flume
- Starting Flume
19. Installing and Configuring Apache Storm
- Install the Storm Package
- Configure Storm
- Configure a Process Controller
- (Optional) Configure Kerberos Authentication for Storm
- (Optional) Configuring Authorization for Storm
- Validate the Installation
20. Installing and Configuring Apache Spark
- Spark Prerequisites
- Installing Spark
- Configuring Spark
- (Optional) Starting the Spark Thrift Server
- (Optional) Configuring Dynamic Resource Allocation
- (Optional) Installing and Configuring Livy
- Validating Spark
21. Installing and Configuring Apache Spark 2
- Spark 2 Prerequisites
- Installing Spark 2
- Configuring Spark 2
- (Optional) Starting the Spark 2 Thrift Server
- (Optional) Configuring Dynamic Resource Allocation
- (Optional) Installing and Configuring Livy
- Validating Spark 2
22. Installing and Configuring Apache Kafka
- Install Kafka
- Configure Kafka
- Validate Kafka
23. Installing and Configuring Zeppelin
- Installation Prerequisites
- Installing the Zeppelin Package
- Configuring Zeppelin
- Starting, Stopping, and Restarting Zeppelin
- Validating Zeppelin
- Accessing the Zeppelin UI
24. Installing Apache Accumulo
- Installing the Accumulo Package
- Configuring Accumulo
- Configuring the "Hosts" Files
- Validating Accumulo
- Smoke Testing Accumulo
25. Installing Apache Falcon
- Installing the Falcon Package
- Setting Directories and Permissions
- Configuring Proxy Settings
- Configuring Falcon Entities
- Configuring Oozie for Falcon
- Configuring Hive for Falcon
- Configuring for Secure Clusters
- Validate Falcon
26. Installing Apache Knox
- Install the Knox Package on the Knox Server
- Set up and Validate the Knox Gateway Installation
- Configuring Knox Single Sign-on (SSO)
27. Installing Apache Slider
28. Setting Up Kerberos Security for Manual Installs
29. Uninstalling HDP

Configuring NameNode Heap Size

NameNode heap size depends on many factors, such as the number of files, the number of blocks, and the load on the system. The following table provides recommendations for NameNode heap size configuration. These settings should work for typical Hadoop clusters in which the number of blocks is very close to the number of files (generally, the average ratio of number of blocks per file in a system is 1.1 to 1.2).

Some clusters might require further tweaking of the following settings. Also, it is generally better to set the total Java heap to a higher value.

Table 1.11. Recommended NameNode Heap Size Settings

Number of Files , in Millions	Total Java Heap (Xmx and Xms)	Young Generation Size (-XX:NewSize -XX:MaxNewSize)
< 1 million files	1126m	128m
1-5 million files	3379m	512m
5-10	5913m	768m
10-20	10982m	1280m
20-30	16332m	2048m
30-40	21401m	2560m
40-50	26752m	3072m
50-70	36889m	4352m
70-100	52659m	6144m
100-125	65612m	7680m
125-150	78566m	8960m
150-200	104473m	8960m

	Note
	Hortonworks recommends a maximum of 300 million files on the NameNode.

You should also set -XX:PermSize to 128m and -XX:MaxPermSize to 256m.

Following are the recommended settings for HADOOP_NAMENODE_OPTS in the hadoop-env.sh file (replacing the ##### placeholder for -XX:NewSize, -XX:MaxNewSize, -Xms, and -Xmx with the recommended values from the table):

-server -XX:ParallelGCThreads=8 -XX:+UseConcMarkSweepGC -XX:ErrorFile=/var/log/hadoop/$USER/hs_err_pid%p.log -XX:NewSize=##### -XX:MaxNewSize=##### -Xms##### -Xmx##### -XX:PermSize=128m -XX:MaxPermSize=256m -Xloggc:/var/log/hadoop/$USER/gc.log-`date +'%Y%m%d%H%M'` -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -Dhadoop.security.logger=INFO,DRFAS -Dhdfs.audit.logger=INFO,DRFAAUDIT ${HADOOP_NAMENODE_OPTS}

If the cluster uses a secondary NameNode, you should also set HADOOP_SECONDARYNAMENODE_OPTS to HADOOP_NAMENODE_OPTS in the hadoop-env.sh file:

HADOOP_SECONDARYNAMENODE_OPTS=$HADOOP_NAMENODE_OPTS

Another useful HADOOP_NAMENODE_OPTS setting is -XX:+HeapDumpOnOutOfMemoryError. This option specifies that a heap dump should be executed when an out-of-memory error occurs. You should also use -XX:HeapDumpPath to specify the location for the heap dump file:

-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=./etc/heapdump.hprof