Chapter 2. File System Partitioning Recommendations

Use the following information to help you set up the file system partitions on master and slave nodes in an HDP cluster.

 1. Partitioning Recommendations for All Nodes

Use the following as a base configuration for all nodes in your cluster:

  • Root partition: OS and core program files

  • Swap: Size 2X system memory

 2. Partitioning Recommendations for Slave Nodes

  • Hadoop Slave node partitions: Hadoop should have its own partitions for Hadoop files and logs. Drives should be partitioned using XFS, ext4, or ext3 in that order of preference. Don't use LVM; it adds latency and causes a bottleneck.  

  • On slave nodes only, all Hadoop partitions should be mounted individually from drives as "/grid/[0-n]".

 3. Hadoop Slave Node Partitioning Configuration Example

  • /swap - 96 GB (for a 48GB memory system)

  • /root - 20GB (ample room for existing files, future log file growth, and OS upgrades)

  • /grid/0/ - [full disk GB] first partition for Hadoop to use for local storage

  • /grid/1/ - second partition for Hadoop to use

  • /grid/2/ - ...

 4. Redundancy (RAID) Recommendations

  • Master nodes -- Configured for reliability (RAID 10, dual Ethernet cards, dual power supplies, etc.)

  • Slave nodes -- RAID is not necessary, as failure on these nodes is managed automatically by the cluster. All data is stored across at least three different hosts, and therefore redundancy is built-in. Slave nodes should be built for speed and low cost.

 5. For Further Reading

The following additional documentation may be useful: