1. Getting Ready to Upgrade

  1. Stop all services (including MapReduce) and client applications deployed on HDFS using the instructions provided here.

  2. Run the fsck command as instructed below and fix any errors. (The resulting file will contain a complete block map of the file system.)

    su $HDFS_USER
    hadoop fsck / -files -blocks -locations > dfs-old-fsck-1.log 

    where $HDFS_USER is the HDFS Service user. For example, hdfs.

  3. Use the following instructions to compare the status before and after the upgrade:

    [Note]Note

    The following commands must be executed by the user running the HDFS service (by default, the user is hdfs).

    1. Capture the complete namespace of the file system. Run recursive listing of the root file system: )

      su $HDFS_USER
      hadoop dfs -lsr / > dfs-old-lsr-1.log 

      where $HDFS_USER is the HDFS Service user. For example, hdfs.

    2. Run report command to create a list of DataNodes in the cluster.

      su $HDFS_USER
      hadoop dfsadmin -report > dfs-old-report-1.log

      where $HDFS_USER is the HDFS Service user. For example, hdfs.

    3. Copy all or unrecoverable data stored in HDFS to a local file system or to a backup instance of HDFS.

    4. Optionally, repeat the steps 3 (a) through 3 (c) and compare the results with the previous run to verify that the state of the file system remains unchanged.

  4. As an HDFS user, execute the following command to save namespace:

    su $HDFS_USER
                                hadoop dfsadmin -safemode enter
    hadoop dfsadmin -saveNamespace

    where $HDFS_USER is the HDFS Service user. For example, hdfs.

  5. Copy the following checkpoint files into a backup directory:

    • dfs.name.dir/edits

    • dfs.name.dir/image/fsimage

  6. Stop the HDFS service using the instructions provided here. Verify that all the HDP services in the cluster are stopped.

  7. If you are upgrading Hive, back up the Hive database.

  8. Configure the local repositories.

    The standard HDP install fetches the software from a remote yum repository over the Internet. To use this option, you must set up access to the remote repository and have an available Internet connection for each of your hosts.

    [Note]Note

    If your cluster does not have access to the Internet, or you are creating a large cluster and you want to conserve bandwidth, you can instead provide a local copy of the HDP repository that your hosts can access. For more information, see Deployment Strategies for Data Centers with Firewalls, a separate document in this set.

    The file you download is named hdp.repo. To function properly in the system, it must be named HDP.repo. Once you have completed the mv of the new repo file to the repos.d folder, make sure there is no file named hdp.repo anywhere in your repos.d folder.

    1. Upgrade the HDP repository on all hosts and replace the old repo file with the new file.

      From a terminal window, type:

      • For RHEL and CentOS 5

        wget  http://docs.hortonworks.com/HDP/centos5/1.x/GA/hdp.repo -O /etc/yum.repos.d/hdp.repo
      • For RHEL and CentOS 6

        wget  http://docs.hortonworks.com/HDP/centos6/1.x/GA/hdp.repo -O /etc/yum.repos.d/hdp.repo
      • For SLES 11

        wget  http://docs.hortonworks.com/HDP/suse11/1.x/GA/hdp.repo -O /etc/zypp/repos.d/hdp.repo
    2. Confirm that the HDP repository is configured by checking the repo list.

      • For RHEL/CentOS:

        yum repolist

      • For SLES:

        zypper repos


loading table of contents...