6. Conclusion

Achieving optimal results from a Hadoop implementation begins with choosing the correct hardware and software stacks. The effort involved in the planning stages can pay off dramatically in terms of the performance and the total cost of ownership (TCO) associated with the environment. Additionally, the following composite system stack recommendations can help benefit organizations in the planning stages:

Table 1.1. For small clusters (5-50 nodes):
Machine Type Workload Pattern Storage Processor (# of Cores) Memory (GB) Network
Slaves Balanced workload Four to six 2 TB disks One Quad 24 1 GB Ethernet all-to-all
Masters Balanced workload Four to six 2 TB disks Dual Quad 24
Table 1.2. For medium to large clusters (100s to 1000s nodes):
Machine Type Workload Pattern Storage Processor (# of Cores) Memory (GB) Network
Slaves Balanced workload Four to six 1 TB disks Dual Quad 24 Dual 1 GB links for all nodes in a 20 node rack and 2 x 10 GB intercon­nect links per rack going to a pair of central switches.
Compute intensive workload Four to six 1 TB or 2 TB disks Dual Hexa Quad 24-48
I/O intensive workload Twelve 1 TB disks Dual Quad 24-48
Masters All workload patterns Four to six 2 TB disks Dual Quad Depends on number of file system objects to be created by NameNode.

For Further Reading