Apache Ambari Upgrade
Also available as:

Prepare Hive for upgrade

Before upgrading the cluster to HDP 3.0.0, you must prepare Hive for the upgrade. The Hive pre-upgrade tool is designed to help you upgrade Hive 2 in HDP 2.6.5 and later to Hive 3 in HDP 3.0.0. Upgrading Hive in releases earlier than HDP 2.6.5 is not supported.

The on-disk layout for transactional tables changed in 3.0 and requires running major compaction before the upgrade to ensure that Hive 3.0 can read the tables. The pre-upgrade tool can identify which tables or partitions need to be compacted and generates a script that you can execute in Beeline to run those compactions. Because compaction can be a time consuming and resource intensive process, you should examine this script to gauge how long the actual upgrade might take. After a major compaction runs in preparation for an upgrade, you must prevent the execution of update, delete, or merge statements against transactional tables until the upgrade is complete. If an update, delete, or merge occurs within a partition, the partition must undergo another major compaction prior to upgrade.

You can use the following key options with the pre-upgrade tool command:

  • -execute

    Use this option only when you want to run the pre-upgrade tool command in Ambari instead of on the Beeline command line. Using Beeline is recommended. This option automatically executes the equivalent of the generated commands.

  • -location

    Use this option to specify the location to write the scripts generated by the pre-upgrade tool.

Running the help for the pre-upgrade command, as described below, shows you all the command options.

Before you begin

  • Ensure that the Hive Metastore is running. Connectivity between the tool and Hive MetaStore is mandatory.

  • Optionally, shut down HiveServer2. Shutting down HiveServer2 is recommended, but not required, to prevent operations on ACID tables while the tool executes.

  • The pre-upgrade tool might submit compaction jobs, so ensure that the cluster has sufficient capacity to execute those jobs. Set the hive.compactor.worker.threads property to accommodate your data.

  • In kerberized cluster, enter kinit before executing the pre-upgrade tool command.

    When you run the pre-upgrade tool command, you might need to set -Djavax.security.auth.useSubjectCredsOnly=false in a Kerberized environment if you see the following types of errors after running kinit:

    org.ietf.jgss.GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)

To run the pre-upgrade tool:

  1. SSH into the host running the Hive Metastore. You can locate this host by going to the Ambari Web UI and clicking Hosts. Click on the Filter icon and type in “Hive Metastore: All” to find each host that has a Hive Metastore Instance installed on it.

  2. Change directories to the /tmp directory.

    cd /tmp

  3. Execute the following command to download the pre-upgrade tool JAR:

    wget http://repo.hortonworks.com/content/repositories/releases/org/apache/hive/hive-pre-upgrade/

  4. Export the JAVA_HOME environment variable if necessary.

    export JAVA_HOME=[ path to your installed JDK ]

  5. Take a look at the help for the pre-upgrade command by using the --help option. For example:

    cd <location of downloaded pre-upgrade tool>

    $JAVA_HOME/bin/java -Djavax.security.auth.useSubjectCredsOnly=false -cp /usr/hdp/current/hive2/lib/derby-*:/usr/hdp/current/hadoop/etc/hadoop/:/tmp/hive-pre-upgrade- org.apache.hadoop.hive.upgrade.acid.PreUpgradeTool --help

    Output is:

    usage: upgrade-acid

    -execute Executes commands equivalent to generated scrips

    -help Generates a script to execute on 2.x cluster. This requires 2.x binaries on the classpath and hive-site.xml.

    -location <arg> Location to write scripts to. Default is CWD.

  6. Adjust the classpath in the following example pre-upgrade command to suit your environment, and then execute a dry run of preupgrading by running the command:

    $JAVA_HOME/bin/java -Djavax.security.auth.useSubjectCredsOnly=false -cp /usr/hdp/current/hive2/lib/derby-*:/usr/hdp/current/hadoop/etc/hadoop/:/tmp/hive-pre-upgrade- org.apache.hadoop.hive.upgrade.acid.PreUpgradeTool

  7. Examine the scripts in the output to understand what running the scripts will do.

  8. Login to Beeline as the Hive service user, and run each generated script to prepare the cluster for upgrading.

    The Hive service user is usually the hive user. This is hive by default. If you don’t know which user is the Hive service user in your cluster, go to the Ambari Web UI and click Cluster Admin > Service Accounts, and then look for Hive User.

Register and Install Target Version