Cluster Planning
Also available as:
PDF

Conclusion

Achieving optimal results from a Hadoop implementation begins with choosing the correct hardware and software stacks. The effort involved in the planning stages can pay off dramatically in terms of the performance and the total cost of ownership (TCO) associated with the environment.

The following composite system stack recommendations can help benefit organizations in the planning stages:

Table 1.1 Sizing Recommendations

Machine Type

Workload Pattern/ Cluster Type

Storage[1]

Processor (# of Cores)

Memory (GB)

Network

Slaves

Balanced workload

Twelve 2-3 TB disks

8

128-256

1 GB onboard, 2x10 GBE mezzanine/external

Compute-intensive workload

Twelve 1-2 TB disks

10

128-256

1 GB onboard, 2x10 GBE mezzanine/external

Storage-heavy workload

Twelve 4+ TB disks

8

128-256

1 GB onboard, 2x10 GBE mezzanine/external

NameNode

Balanced workload

Four or more 2-3 TB RAID 10 with spares

8

128-256

1 GB onboard, 2x10 GBE mezzanine/external

ResourceManager

Balanced workload

Four or more 2-3 TB RAID 10 with spares

8

128-256

1 GB onboard, 2x10 GBE mezzanine/external

[1] Reserve at least 2.5 GB of hard drive space for each version of HDP to be installed.