1. Getting Ready to Upgrade

HDP Stack upgrade involves removing HDP 1.x MapReduce and replacing it with HDP 2.x YARN and MapReduce2. Before you begin, review the upgrade process and complete the backup steps.

  1. Back up the following HDP 1.x directories:

    • /etc/hadoop/conf

    • /etc/hbase/conf

    • /etc/hcatalog/conf

      [Note]Note

      With HDP 2.1, /etc/hcatalog/conf is divided into /etc/hive-hcatalog/conf and /etc/hive-webhcat/conf.You cannot use /etc/hcatalog/conf in HDP 2.1.

    • /etc/hive/conf

    • /etc/pig/conf

    • /etc/sqoop/conf

    • /etc/flume/conf

    • /etc/mahout/conf

    • /etc/oozie/conf

    • /etc/hue/conf

    • /etc/zookeeper/conf

    • Optional - Back up your userlogs directories, ${mapred.local.dir}/userlogs.

  2. Run the fsck command as the HDFS Service user and fix any errors. (The resulting file contains a complete block map of the file system.) For example:

    su $HDFS_USER
    hadoop fsck / -files -blocks -locations > dfs-old-fsck-1.log  

    where $HDFS_USER is the HDFS Service user. For example, hdfs.

    [Note]Note

    This example is for unsecure clusters. In secure mode, your cluster requires kerberos credentials the HDFS user.

  3. Use the following instructions to compare status before and after the upgrade:

    [Note]Note

    The following commands must be executed by the user running the HDFS service (by default, the user is hdfs).

    1. Capture the complete namespace of the file system. (The following command does a recursive listing of the root file system.)

      su $HDFS_USER
      hdfs dfs -lsr / > dfs-old-lsr-1.log 

      where $HDFS_USER is the HDFS Service user. For example, hdfs.

    2. Run the report command to create a list of DataNodes in the cluster.

      su $HDFS_USER
      hdfs dfsadmin -report > dfs-old-report-1.log

      where $HDFS_USER is the HDFS Service user. For example, hdfs.

    3. Optional. You can copy all or unrecoverable only data stored in HDFS to a local file system or to a backup instance of HDFS.

    4. Optional. You can also repeat the steps 3 (a) through 3 (c) and compare the results with the previous run to ensure the state of the file system remained unchanged.

  4. Check for HFiles in V1 format. HBase 0.96.0 discontinues support for HFileV1. Before the actual upgrade, install the HBase 0.96 binaries on a separate host using the hbase-site.xml configuration file from the running HBase 0.94 binaries. Then, run the following command against the HBase 0.96 binaries to check if there are HFiles in V1 format:

    hbase upgrade -check

    HFileV1 was a common format prior to HBase 0.94. You may see output similar to:

    Tables Processed:
                        
    hdfs://localhost:41020/myHBase/.META.
    hdfs://localhost:41020/myHBase/usertable
    hdfs://localhost:41020/myHBase/TestTable
    hdfs://localhost:41020/myHBase/t
                        
    Count of HFileV1: 2
    HFileV1:
    hdfs://localhost:41020/myHBase/usertable/fa02dac1f38d03577bd0f7e666f12812/family/249450144068442524
    hdfs://localhost:41020/myHBase/usertable/ecdd3eaee2d2fcf8184ac025555bb2af/family/249450144068442512
                        
    Count of corrupted files: 1
    Corrupted Files:
    hdfs://localhost:41020/myHBase/usertable/fa02dac1f38d03577bd0f7e666f12812/family/1
    Count of Regions with HFileV1: 2
    Regions to Major Compact:
    hdfs://localhost:41020/myHBase/usertable/fa02dac1f38d03577bd0f7e666f12812
    hdfs://localhost:41020/myHBase/usertable/ecdd3eaee2d2fcf8184ac025555bb2af

    When you run the upgrade check, if “Count of HFileV1” returns any files, start the hbase shell to use major compaction for regions that have HFileV1 format. For example in the sample output above, you must compact the fa02dac1f38d03577bd0f7e666f12812 and ecdd3eaee2d2fcf8184ac025555bb2af regions.

  5. Optional. If you are upgrading HBase on a secure cluster, flush the ACL table by running the following HBase shell command as the $HBase_User.

    flush '_acl_'
  6. Stop all HDP 1.3 services (including MapReduce) except HDFS:

    1. Stop Nagios. On the Nagios host machine, execute the following command:

      service nagios stop 

    2. Stop Ganglia.

      1. Execute this command on the Ganglia server host machine:

        /etc/init.d/hdp-gmetad stop
      2. Execute this command on all the nodes in your Hadoop cluster:

        /etc/init.d/hdp-gmond stop
    3. Stop Oozie. On the Oozie server host machine, execute the following command:

       sudo su -l $OOZIE_USER -c "cd $OOZIE_LOG_DIR; /usr/lib/oozie/bin/oozie-stop.sh" 

      where:

      • $OOZIE_USER is the Oozie Service user. For example, oozie

      • $OOZIE_LOG_DIR is the directory where Oozie log files are stored (for example: /var/log/oozie).

    4. Stop WebHCat. On the WebHCat host machine, execute the following command:

      su -l $WEBHCAT_USER -c "/usr/lib/hcatalog/sbin/webhcat_server.sh  stop"

      where:

      • $WEBHCAT_USER is the WebHCat Service user. For example, hcat.

    5. Stop Hive. On the Hive Metastore host machine and Hive Server2 host machine, execute the following command:

      ps aux | awk '{print $1,$2}' | grep hive | awk '{print $2}' | xargs kill >/dev/null 2>&1  

      This will stop Hive Metastore and HCatalog services.

    6. Stop ZooKeeper. On the ZooKeeper host machine, execute the following command:

      su - $ZOOKEEPER_USER -c "export ZOOCFGDIR=/etc/zookeeper/conf ; export ZOOCFG=zoo.cfg ;source /etc/zookeeper/conf/zookeeper-env.sh ; /usr/lib/zookeeper/bin/zkServer.sh stop" 

      where $ZOOKEEPER_USER is the ZooKeeper Service user. For example, zookeeper.

    7. Stop HBase.

      1. Execute these commands on all RegionServers:

        su -l $HBASE_USER -c "/usr/lib/hbase/bin/hbase-daemon.sh --config /etc/hbase/conf stop regionserver"
      2. Execute these commands on the HBase Master host machine:

        su -l $HBASE_USER -c "/usr/lib/hbase/bin/hbase-daemon.sh --config /etc/hbase/conf stop master"

      where $HBASE_USER is the HBase Service user. For example, hbase.

    8. Stop MapReduce

      1. Execute these commands on all TaskTrackers slaves:

        su -l $MAPRED_USER -c "/usr/lib/hadoop/bin/hadoop-daemon.sh --config /etc/hadoop/conf stop tasktracker"
      2. Execute these commands on the HistoryServer host machine:

        su -l $MAPRED_USER -c "/usr/lib/hadoop/bin/hadoop-daemon.sh --config /etc/hadoop/conf stop historyserver"
      3. Execute theses commands on the node running the JobTracker host machine:

        su -l $MAPRED_USER -c "/usr/lib/hadoop/bin/hadoop-daemon.sh --config /etc/hadoop/conf stop jobtracker"

      where $MAPRED_USER is the MapReduce Service user. For example, mapred.

  7. As the HDFS user, save the namespace by executing the following command:

    su $HDFS_USER
    hdfs dfsadmin -safemode enter
    hdfs dfsadmin -saveNamespace

  8. Backup your NameNode metadata.

    1. Copy the following checkpoint files into a backup directory:

      • dfs.name.dir/edits

      • dfs.name.dir/image/fsimage

      • dfs.name.dir/current/fsimage

    2. Store the layoutVersion of the namenode.

      ${dfs.name.dir}/current/VERSION

  9. If you have any prior HDFS upgrade in-progress, finalize it if you have not done so already.

    su $HDFS_USER
    hdfs dfsadmin -finalizeUpgrade
  10. Optional - Backup the Hive Metastore database.

    [Note]Note

    These instructions are provided for your convenience. Please check your database documentation for the latest back up instructions.

     

    Table 24.1. Hive Metastore Database Backup and Rstore

    Database Type BackupRestore

    MySQL

    mysqldump $dbname > $outputfilename.sql For example: mysqldump hive > /tmp/mydir/backup_hive.sql mysql $dbname < $inputfilename.sql For example: mysql hive < /tmp/mydir/backup_hive.sql

    Postgres

    sudo -u $username pg_dump $databasename > $outputfilename.sql For example: sudo -u postgres pg_dump hive > /tmp/mydir/backup_hive.sqlsudo -u $username psql $databasename < $inputfilename.sql For example: sudo -u postgres psql hive < /tmp/mydir/backup_hive.sql
    Oracle Connect to the Oracle database using sqlplus export the database: exp username/password@database full=yes file=output_file.dmp Import the database: imp username/password@database ile=input_file.dmp

  11. Optional - Backup the Oozie Metastore database.

    [Note]Note

    These instructions are provided for your convenience. Please check your database documentation for the latest back up instructions.

     

    Table 24.2. Oozie Metastore Database Backup and Restore

    Database Type BackupRestore

    MySQL

    mysqldump $dbname > $outputfilename.sql For example: mysqldump oozie > /tmp/mydir/backup_oozie.sql mysql $dbname < $inputfilename.sql For example: mysql oozie < /tmp/mydir/backup_oozie.sql

    Postgres

    sudo -u $username pg_dump $databasename > $outputfilename.sql For example: sudo -u postgres pg_dump oozie > /tmp/mydir/backup_oozie.sqlsudo -u $username psql $databasename < $inputfilename.sql For example: sudo -u postgres psql oozie < /tmp/mydir/backup_oozie.sql

  12. Stop HDFS

    1. Execute these commands on all DataNodes:

      su -l $HDFS_USER -c "/usr/lib/hadoop/bin/hadoop-daemon.sh --config /etc/hadoop/conf stop datanode"

      If you are running a secure cluster, stop the DataNode as root:

      su -c "/usr/lib/hadoop/bin/hadoop-daemon.sh --config /etc/hadoop/conf stop datanode"
    2. Execute these commands on the Secondary NameNode host machine:

      su -l $HDFS_USER -c "/usr/lib/hadoop/bin/hadoop-daemon.sh --config /etc/hadoop/conf stop secondarynamenode” 
    3. Execute these commands on the NameNode host machine:

      su -l $HDFS_USER -c "/usr/lib/hadoop/bin/hadoop-daemon.sh --config /etc/hadoop/conf stop namenode"

      where $HDFS_USER is the HDFS Service user. For example, hdfs.

  13. Verify that edit logs in ${dfs.name.dir}/name/current/edits* are empty. These log files should have only 4 bytes of data, which contain the edit logs version. If the edit logs are not empty, start the existing version NameNode and then shut it down after a new fsimage has been written to disks so that the edit log becomes empty.


loading table of contents...