7. Conclusion

Achieving optimal results from a Hadoop implementation begins with choosing the correct hardware and software stacks. The effort involved in the planning stages can pay off dramatically in terms of the performance and the total cost of ownership (TCO) associated with the environment. Additionally, the following composite system stack recommendations can help benefit organizations in the planning stages:

Table 1.1. For small clusters (5-50 nodes):
Machine Type Workload Pattern/ Cluster Type Storage Processor (# of Cores) Memory (GB) Network
Slaves Balanced workload Four to six 2 TB disks One Quad 24 1 GB Ethernet all-to-all
HBase cluster Six 2 TB disks Dual Quad 48
Masters Balanced and/or HBase cluster Four to six 2 TB disks Dual Quad 24
Table 1.2. For medium to large clusters (100s to 1000s nodes):
Machine Type Workload Pattern/ Cluster Type Storage Processor (# of Cores) Memory (GB) Network
Slaves Balanced workload Four to six 1 TB disks Dual Quad 24 Dual 1 GB links for all nodes in a 20 node rack and 2 x 10 GB intercon­nect links per rack going to a pair of central switches.
Compute intensive workload Four to six 1 TB or 2 TB disks Dual Hexa Quad 24-48
I/O intensive workload Twelve 1 TB disks Dual Quad 24-48
HBase clusters Twelve 1 TB disks Dual Hexa Quad 48-96
Masters All workload patterns/HBase clusters Four to six 2 TB disks Dual Quad Depends on number of file system objects to be created by NameNode.

For Further Reading