10.2. Define Directories

The following table describes the directories for install, configuration, data, process IDs and logs based on the Hadoop Services you plan to install. Use this table to define what you are going to use in setting up your environment.

[Note]Note

The scripts directory you downloaded in Download Companion Files includes a script, directories.sh, for setting directory environment parameters.

We strongly suggest you edit and source (alternatively, you can also copy the contents to your ~/.bash_profile) to set up these environment variables in your environment.

 

Table 1.3. Define Directories for Core Hadoop

Hadoop ServiceParameterDefinition

HDFS

DFS_NAME_DIR

Space separated list of directories where NameNode should store the file system image.

For example,

/grid/hadoop/hdfs/nn

/grid1/hadoop/hdfs/nn

HDFS

DFS_DATA_DIR

Space separated list of directories where DataNodes should store the blocks.

For example,

/grid/hadoop/hdfs/dn

/grid1/hadoop/hdfs/dn

/grid2/hadoop/hdfs/dn

HDFS

FS_CHECKPOINT_DIR

Space separated list of directories where SecondaryNameNode should store the checkpoint image.

For example,

/grid/hadoop/hdfs/snn

/grid1/hadoop/hdfs/snn

/grid2/hadoop/hdfs/snn

HDFS

HDFS_LOG_DIR

Directory for storing the HDFS logs. This directory name is a combination of a directory and the $HDFS_USER.

For example,

/var/log/hadoop/hdfs

where hdfs is the $HDFS_USER.

HDFS

HDFS_PID_DIR

Directory for storing the HDFS process ID. This directory name is a combination of a directory and the $HDFS_USER.

For example,

/var/run/hadoop/hdfs

where hdfs is the $HDFS_USER

HDFS

HADOOP_CONF_DIR

Directory for storing the Hadoop configuration files.

For example,

/etc/hadoop/conf

YARN

YARN_LOCAL_DIR

Space separated list of directories where YARN should store temporary data.

For example,

/grid/hadoop/yarn

/grid1/hadoop/yarn

/grid2/hadoop/yarn.

YARN

YARN_LOG_DIR

Directory for storing the YARN logs.

For example,

/var/log/hadoop/yarn.

This directory name is a combination of a directory and the $YARN_USER. In the example yarn is the $YARN_USER.

YARN

YARN_PID_DIR

Directory for storing the YARN process ID.

For example,

/var/run/hadoop/yarn.

This directory name is a combination of a directory and the $YARN_USER. In the example, yarn is the $YARN_USER.

MapReduce

MAPRED_LOG_DIR

Directory for storing the JobHistory Server logs.

For example,

/var/log/hadoop/mapred.

This directory name is a combination of a directory and the $MAPRED_USER. In the example mapred is the $MAPRED_USER


 

Table 1.4. Define Directories for Ecosystem Components

Hadoop ServiceParameterDefinition

Pig

PIG_CONF_DIR

Directory to store the Pig configuration files. For example, /etc/pig/conf.

Pig

PIG_LOG_DIR

Directory to store the Pig logs. For example, /var/log/pig.

Pig

PIG_PID_DIR

Directory to store the Pig process ID. For example, /var/run/pig.

Hive

HIVE_CONF_DIR

Directory to store the Hive configuration files. For example, /etc/hive/conf.

Hive

HIVE_LOG_DIR

Directory to store the Hive logs. For example, /var/log/hive.

Hive

HIVE_PID_DIR

Directory to store the Hive process ID. For example, /var/run/hive.

WebHCat

WEBHCAT_CONF_DIR

Directory to store the WebHCat configuration files. For example, /etc/hcatalog/conf/webhcat.

WebHCat

WEBHCAT_LOG_DIR

Directory to store the WebHCat logs. For example, var/log/webhcat.

WebHCat

WEBHCAT_PID_DIR

Directory to store the WebHCat process ID. For example, /var/run/webhcat.

HBase

HBASE_CONF_DIR

Directory to store the HBase configuration files. For example, /etc/hbase/conf.

HBase

HBASE_LOG_DIR

Directory to store the HBase logs. For example, /var/log/hbase.

HBase

HBASE_PID_DIR

Directory to store the HBase process ID. For example, /var/run/hbase.

ZooKeeper

ZOOKEEPER_DATA_DIR

Directory where ZooKeeper will store data. For example, /grid/hadoop/zookeeper/data

ZooKeeper

ZOOKEEPER_CONF_DIR

Directory to store the ZooKeeper configuration files. For example, /etc/zookeeper/conf.

ZooKeeper

ZOOKEEPER_LOG_DIR

Directory to store the ZooKeeper logs. For example, /var/log/zookeeper.

ZooKeeper

ZOOKEEPER_PID_DIR

Directory to store the ZooKeeper process ID. For example, /var/run/zookeeper.

ZooKeepermyidEvery machine that is part of the ZooKeeper ensemble should know about every other machine in the ensemble. Create a file named myid (one for each server) which resides in that server's data directory $ZOOKEEPER_DATA_DIR. The myid file consists of a single line containing only the text of that machine's id. So myid of server 1 would contain the string "1" and nothing else. The id must be unique within the ensemble and should have a value between 1 and 255.


loading table of contents...