1. Machine roles in a typical Hadoop cluster

In Hadoop and HBase, the following two types of machines are available:

  • Masters (HDFS NameNode, Secondary NameNode, YARN ResourceManagers, and the HBase Master)


    It is recommended to add only limited number of disks to the master nodes, because the master nodes do not have high storage demands.

  • Slaves (HDFS DataNodes, YARN NodeManagers, and HBase RegionServers)

Additionally, we strongly recommend that you use separate client machines for performing the following tasks:

  • Load data in the HDFS cluster

  • Submit YARN applications(describing how to process the data)

  • Retrieve or view the results of the job after its completion

  • Submit Pig or Hive queries

Based on the recommended settings for the client machines, the following illustration provides details of a typical Hadoop cluster: