1. Typical Hadoop Cluster

Hadoop and HBase clusters have two types of machines:

  • Masters (HDFS NameNode, MapReduce JobTracker (HDP-1) or YARN ResourceManager (HDP-2), and HBase Master)

  • Slaves ( HDFS DataNodes, MapReduce TaskTrackers (HDP-1) or YARN NodeManagers (HDP-2), and the HBase RegionServers). The DataNodes, TaskTrackers/NodeManagers, and HBase RegionServers are co-located or co-deployed for optimal data locality.

    In addition, HBase requires the use of a separate component (ZooKeeper) to manage the HBase cluster.

Hortonworks recommends separating master and slave nodes because:

  • Task/application workloads on the slave nodes should be isolated from the masters.

  • Slaves nodes are frequently decommissioned for maintainance.

For evaluation purpose, it is possible to deploy Hadoop using a single-node installation (all the masters and the slave processes reside on the same machine).

For a small cluster (of two nodes) - one node is used as the master: both NameNode and JobTracker/ResourceManager and the other node is used as the slave: DataNode and TaskTracker/NodeMangers.

Clusters of three or more machines typically use a single NameNode and JobTracker/ResourceManager with all the other nodes as slave nodes. Typically, a medium to large Hadoop cluster consists of a two or three-level architecture built with rack-mounted servers. Each rack of servers is interconnected using a 1 Gigabit Ethernet (GbE) switch. Each rack-level switch is connected to a cluster-level switch (which is typically a larger port-density 10GbE switch). These cluster-level switches may also interconnect with other cluster-level switches or even uplink to another level of switching infrastructure.