Chapter 4. Validating the Core Hadoop Installation

This section describes starting Core Hadoop and doing simple smoke tests. Use the following instructions to validate core Hadoop installation:

  1. Format and start HDFS.

    1. Execute these commands on the NameNode:

      su $HDFS_USER
      /usr/lib/hadoop/bin/hadoop namenode -format
      /usr/lib/hadoop/bin/hadoop-daemon.sh --config $HADOOP_CONF_DIR start namenode
    2. Execute these commands on the Secondary NameNode :

      su $HDFS_USER
      /usr/lib/hadoop/bin/hadoop-daemon.sh --config $HADOOP_CONF_DIR start secondarynamenode
    3. Execute these commands on all DataNodes:

      su $HDFS_USER
      /usr/lib/hadoop/bin/hadoop-daemon.sh --config $HADOOP_CONF_DIR start datanode

    where:

    • $HDFS_USER is the user owning the HDFS services. For example, hdfs.

    • $HADOOP_CONF_DIR is the directory for storing the Hadoop configuration files. For example, /etc/hadoop/conf.

  2. Smoke Test HDFS.

    1. See if you can reach the NameNode server with your browser:

      http://$namenode.full.hostname:50070
    2. Try copying a file into HDFS and listing that file:

      su $HDFS_USER
      /usr/lib/hadoop/bin/hadoop dfs -copyFromLocal /etc/passwd passwd-test
      /usr/lib/hadoop/bin/hadoop dfs -ls
    3. Test browsing HDFS:

      http://$datanode.full.hostname:50075/browseDirectory.jsp?dir=/

  3. Start MapReduce.

    1. Execute these commands from the JobTracker server:

      su $HDFS_USER
      /usr/lib/hadoop/bin/hadoop fs -mkdir /mapred
      /usr/lib/hadoop/bin/hadoop fs -chown -R mapred /mapred
      su $MAPRED_USER
      /usr/lib/hadoop/bin/hadoop-daemon.sh --config $HADOOP_CONF_DIR start jobtracker
    2. Execute these commands from the JobHistory server:

      su $MAPRED_USER
      /usr/lib/hadoop/bin/hadoop-daemon.sh --config $HADOOP_CONF_DIR start historyserver
    3. Execute these commands from all TaskTracker nodes:

      su $MAPRED_USER
      /usr/lib/hadoop/bin/hadoop-daemon.sh --config $HADOOP_CONF_DIR start tasktracker
      

    where:

    • $HDFS_USER is the user owning the HDFS services. For example, hdfs.

    • $MAPRED_USER is the user owning the MapReduce services. For example, mapred.

    • $HADOOP_CONF_DIR is the directory for storing the Hadoop configuration files. For example, /etc/hadoop/conf.

  4. Smoke Test MapReduce.

    1. Try browsing to the JobTracker:

      http://$jobtracker.full.hostname:50030/
    2. Smoke test using Teragen (to generate 10GB of data) and then using Terasort to sort the data.

      sus $HDFS_USER
      /usr/lib/hadoop/bin/hadoop jar /usr/lib/hadoop/hadoop-examples.jar teragen 100000000 /test/10gsort/input
      /usr/lib/hadoop/bin/hadoop jar /usr/lib/hadoop/hadoop-examples.jar terasort /test/10gsort/input /test/10gsort/output
      


loading table of contents...