2. Back Up Critical HDFS Metadata

Back up the following critical data before attempting an upgrade.

  1. On the node that hosts the NameNode, open the Hadoop Command Line shortcut (or open a command window in the Hadoop directory). As the hadoop user, go to the HDFS home directory:

    runas /user:hadoop "cmd /K cd %HDFS_DATA_DIR%"

  2. Run the fsck command to fix any file system errors.

    hdfs fsck / -files -blocks -locations > dfs-old-fsck-1.log 

    The console output is printed to the dfs-old-fsck-1.log file.

  3. Capture the complete namespace directory tree of the file system:

    hdfs dfs -ls -R / > dfs-old-lsr-1.log

  4. Create a list of DataNodes in the cluster:

    hdfs dfsadmin -report > dfs-old-report-1.log 

  5. Capture output from the fsck command:

    hdfs fsck / -blocks -locations -files > fsck-old-report-1.log 

    Verify that there are no missing or corrupted files/replicas in the fsck command output.

  6. Save the HDFS namespace:

    1. Place the NameNode in safe mode, to keep HDFS from accepting any new writes:

      hdfs dfsadmin -safemode enter 

    2. Save the namespace.

      hdfs dfsadmin -saveNamespace 

      [Warning]Warning

      From this point on, HDFS should not accept any new writes. Stay in safe mode!

    3. Finalize the namespace:

       hdfs namenode -finalize

    4. On the machine that hosts the NameNode, copy the following checkpoint directories into a backup directory:

      %HDFS_DATA_DIR%\hdfs\nn\edits\current
      %HDFS_DATA_DIR%\hdfs\nn\edits\image 
      %%HDFS_DATA_DIR%\hdfs\nn\edits\previous.checkpoint