Apache Hive overview
Also available as:
PDF

Apache Hive 3 upgrade process

Supplemental information about preparing for an upgrade, upgrading, and using Hive tables after upgrading to Hive 3 helps you achieve a successful HDP and Apache Ambari major upgrade.

Some transactional tables require a major compaction before upgrading to 3.0. Running the Hive pre-upgrade tool identifies the tables that need such a compaction and provides scripts that you run to perform the compaction. Depending on the number of tables and partitions, and the amount of data involved, compactions might take a significant amount of time and resources. The script output of the pre-upgrade tool includes some heuristics that might help estimate the time required. If no script is produced, no compaction is needed.

Compaction cannot occur if the pre-upgrade tool cannot connect to Hive Metastore. During compaction, shutting down HiveServer2 is recommended to prevent users from executing any update, delete, or merge statements on tables during compaction and for the duration of the upgrade process.

You should run the pre-upgrade tool command on the command line after upgrading Ambari 2.6.2.2 to 2.7.x. You do not actually use Ambari to run this command.

The following properties can affect compaction:
  • hive.compactor.worker.threads

    Specifies limits of concurrent compactions.

  • hive.compactor.job.queue

    Specifies the Yarn queue of compaction jobs. Each compaction is a MapReduce job.

The pre-upgrade tool looks for files in an ACID table that contains update or delete events, and generates scripts to compact these tables. You prepare Hive for upgrade to obtain and run the scripts. Assuming you upgraded Ambari at some point, you can then upgrade HDP components, including Hive. After upgrading, check generated logs for any errors. Check that the upgrade process correctly converted your tables.