Other Issues

In addition to the hardware considerations, you must consider factors such as using erasure coding for storage capacity, weight of the server racks, scalability of the cluster, and others for your Hadoop cluster.

Erasure Coding for Increased Storage Capacity

When using Erasure Coding, you can improve the performance of data encoding and decoding can by using Intel’s ISA-L instruction set to speed up EC calculations on x86 hardware.

Weight

The storage density of the latest generation of servers means that the weight of the racks needs to be taken into account. You should verify that the weight of a rack is not more than the capacity of the data center’s floor.

Scalability

It is easy to scale a Hadoop cluster by adding new servers or whole server racks to the cluster and increasing the memory in the master nodes to deal with the increased load. This will generate a lot of “rebalancing traffic” at first, but will deliver extra storage and computation. Because the master nodes do matter, we recommend that you pay the premiums for those machines.

Use the following guidelines to scale your existing Hadoop cluster:

Ensure there is potential free space in the data center near the Hadoop cluster. This space should be able to accommodate the power budget for more racks.
Plan the network to cope with more servers
It might be possible to add more disks and RAM to the existing servers - and extra CPUs if the servers have spare sockets. This can expand an existing cluster without adding more racks or network changes.
To perform a hardware upgrade in a live cluster can take considerable time and effort. We recommend that you plan the expansion one server at a time.
CPU parts do not remain on the vendors price list forever. If you do plan to add a second CPU, consult with your reseller on when they will cut the price of CPUs that your existing parts and buy these parts when available. This typically takes at least 18 months time period.
You are likely to need more memory in the master servers.

Support contracts

The concept to consider here is “care for the master nodes, keep an eye on the slave nodes”. You do not need traditional enterprise-class hardware support contracts for the majority of the nodes in the cluster, as their failures are more of a statistics issue than a crisis. The money saved in hardware support can go into more slave nodes.

Commissioning

Hortonworks plans to cover the best practices commissioning a Hadoop cluster in a future document. For now, note that the “smoke tests” that come with the Hadoop cluster are a good initial test, followed by Terasort. Some of the major server vendors offer in factory commissioning of Hadoop clusters for an extra fee. This can have a direct benefit in ensuring that the cluster is working before you receive and pay for it. There is an indirect benefit in that if the Terasort performance is lower on-site than in-factory, the network is the likely culprit which makes it is possible to track down the problem faster.