Use the following information to help you set up the file system partitions on master and slave nodes in an HDP cluster.
Use the following as a base configuration for all nodes in your cluster:
Root partition: OS and core program files
Swap: Size 2X system memory
Hadoop Slave node partitions: Hadoop should have its own partitions for Hadoop files and logs. Drives should be partitioned using XFS, ext4, or ext3 in that order of preference. Don't use LVM; it adds latency and causes a bottleneck.
On slave nodes only, all Hadoop partitions should be mounted individually from drives as "/grid/[0-n]".
/swap - 96 GB (for a 48GB memory system)
/root - 20GB (ample room for existing files, future log file growth, and OS upgrades)
/grid/0/ - [full disk GB] first partition for Hadoop to use for local storage
/grid/1/ - second partition for Hadoop to use
/grid/2/ - ...
Master nodes -- Configured for reliability (RAID 10, dual Ethernet cards, dual power supplies, etc.)
Slave nodes -- RAID is not necessary, as failure on these nodes is managed automatically by the cluster. All data is stored across at least three different hosts, and therefore redundancy is built-in. Slave nodes should be built for speed and low cost.
The following additional documentation may be useful:
Hortonworks Knowledge-Base article on options for selecting your underlying Linux file system: Best Practices: Linux File Systems for HDFS
CentOS partitioning documentation: Partitioning Your System
Reference architectures from other Hadoop clusters: Hadoop Reference Architectures