Upgrade Troubleshooting

In the event of a problem, contacting Hortonworks Support is highly recommended. Alternatively, you can perform these troubleshooting procedures.

Restoring the Hive Metastore

On the node where the database for Hive Metastore resides, create databases you want to restore. For example:
$ mysql -u hiveuser -p -e "create database <hive_db_schema_name>;"
Restore each Metastore database from the dump you created. For example:
$ mysql -u hiveuser -p <hive_db_schema_name> < </path/to/dump_file>
Reconfigure Hive Metastore if necessary. Reconfiguration might be necessary if the upgrade fails. Contacting Hortonworks Support for help with reconfiguration is recommended. Alternatively, in HDP 3.x, set key=value commands on the command line to configure Hive Metastore.

Solving Problems Using Spark and Hive

Use the Hive Warehouse Connector (HWC) and low-latency analytical processing (LLAP) to access Spark data after upgrading. HWC is a Spark library/plugin that is launched with the Spark app. HWC and LLAP are required for certain tasks, as shown in the following table:

Table 4.2. Spark Compatibility

Tasks	HWC Required	LLAP Required	Other Requirement/Comments
Read Hive managed tables from Spark	Yes	Yes	Ranger ACLs enforced.
Write Hive managed tables from Spark	Yes	No	Ranger ACLs enforced.
Read Hive external tables from Spark	No	Only if HWC is used	Table must be defined in Spark catalog. Ranger ACLs not enforced.
Write Hive external tables from Spark	No	No	Ranger ACLs enforced.

Spark-submit and pyspark are supported. The spark thrift server is not supported.

Accessing Hive tables using SparkSQL

To access tables, which were converted to ACID tables during the upgrade, using SparkSQL, you create a new external table using Hive 3 and migrate the data from the managed to the new table.

Rename the managed table to *_old.
Migrate data from *_old to <new> external table using the original name in the historical or the default location (/warehouse/tablespace/external/hive/<?>.db/<tablename>).
CREATE EXTERNAL TABLE new_t AS SELECT * FROM old_t;

Recovering missing Hive tables

Login as Superuser HDFS. For example:
$ sudo su - hdfs
Read the snapshots on HDFS that backup your table data. For example:
$ hdfs dfs -cat /apps/hive/warehouse/.snapshot/s20181204-164645.898/students/000000_0
Example output for a trivial table having two rows and three columns might look something like this:
fred flintstone351.28
barney rubble322.32
In Hive, insert the data into the table if the schema exists in the Hive warehouse; otherwise, restore the Hive Metastore, which includes the schemas, from the database dump you created in the pre-upgrade process.

YARN Registry DNS instance fails to start

The YARN Registry DNS instance will fail to start if another process on the host is bound to port 53. Ensure no other services that are binding to port 53 are on the host where the YARN Registry DNS instance is deployed.

Class Loading Issue When Starting Solr

If you do not follow sequential steps during the upgrade, the Infra Solr instance may fail to start with the following exception:

null:org.apache.solr.common.SolrException: Error loading class
'org.apache.solr.security.InfraRuleBasedAuthorizationPlugin'

If you see this exception, follow the steps in this HCC article to work around the issue:

https://community.hortonworks.com/content/supportkb/210579/error-nullorgapachesolrcommonsolrexception-error-l.html

Ambari Metrics System (AMS) does not start

When the Ambari Metrics System (AMS) does not start after upgrade, you can observe the following log snippet in the HBase Master:

master.HMaster: hbase:meta,,1.1588230740 is NOT online; state={1588230740 state=OPEN,
ts=1543610616273, server=regionserver1.domain.com,41213,1543389145213}; ServerCrashProcedures=true.
Master startup cannot progress, in holding-pattern until region onlined

The workaround is to manually clean up the znode from ZooKeeper.

If AMS mode = embedded, Remove the znode data from local filesystem path, e.g.:
```
 rm -f /var/lib/ambari-metrics-collector/hbase-tmp/zookeeper/zookeeper_0/version-2/*
```

If AMS mode = distributed, connect to the cluster zookeeper instance and delete the following node before restart:

 /usr/hdp/current/zookeeper-client/bin/zkCli.sh -server localhost:2181
[zk: localhost:2181(CONNECTED) 0] rmr /ams-hbase-unsecure/meta-region-server

​Upgrade Troubleshooting

Upgrade Troubleshooting