Hardware for Master Nodes
The master nodes, being unique, have significantly different storage and memory requirements than the slave nodes.
The following paragraphs discuss some of the memory/storage trade-offs in some detail.
We recommend using dual NameNode servers - one primary and one secondary. Both NameNode servers should have highly reliable storage for their namespace storage and edit-log journalling. Typically, hardware RAID and/or reliable network storage are justifiable options.
The master servers should have at least four redundant storage volumes, some local and some networked, but each can be relatively small (typically 1TB).
The RAID disks on the master nodes are a good place to consider support contracts. We recommend including an on-site disk replacement option in your support contract so that a failed RAID disk can be replaced quickly.
Multiple vendors sell NAS software. It is important to check their specifications before you invest in any NAS software.
Storage options for ResourceManager servers
In actuality ResourceManager servers do not need RAID storage because they save their persistent state to HDFS. The ResourceManager server can actually be run on a slave node with a bit of extra RAM. However, using the same hardware specifications for the ResourceManager servers as for the NameNode server provides the possibility of migrating the NameNode to the same server as the ResourceManager in the case of NameNode failure and a copy of the NameNode’s state can be saved to the network storage.
The amount of memory required for the master nodes depends on the number of file system objects (files and block replicas) to be created and tracked by the NameNode. 64 GB of RAM supports approximately 100 million files. Some sites are now experimenting with 128GB of RAM, for even larger namespaces.
NameNodes and their clients are very “chatty”. We therefore recommend providing 16 or even 24 CPU cores to handle messaging traffic for the master nodes.
Providing multiple network ports and 10 GB bandwidth to the switch is also acceptable (if the switch can handle it).