Apache Ambari Major Upgrade
Also available as:
PDF

Upgrade Troubleshooting

In the event of a problem, contacting Hortonworks Support is highly recommended. Alternatively, you can perform these troubleshooting procedures.

Restoring the Hive Metastore

  1. On the node where the database for Hive Metastore resides, create databases you want to restore. For example:

    $ mysql -u hiveuser -p -e "create database <hive_db_schema_name>;"

  2. Restore each Metastore database from the dump you created. For example:

    $ mysql -u hiveuser -p <hive_db_schema_name> < </path/to/dump_file>

  3. Reconfigure Hive Metastore if necessary. Reconfiguration might be necessary if the upgrade fails. Contacting Hortonworks Support for help with reconfiguration is recommended. Alternatively, in HDP 3.x, set key=value commands on the command line to configure Hive Metastore.

Recovering missing Hive tables

  1. Login as Superuser HDFS. For example:

    $ sudo su - hdfs

  2. Read the snapshots on HDFS that backup your table data. For example:

    $ hdfs dfs -cat /apps/hive/warehouse/.snapshot/s20181204-164645.898/students/000000_0

    Example output for a trivial table having two rows and three columns might look something like this:

    fred flintstone351.28

    barney rubble322.32

  3. In Hive, insert the data into the table if the schema exists in the Hive warehouse; otherwise, restore the Hive Metastore, which includes the schemas, from the database dump you created in the pre-upgrade process.

YARN Registry DNS instance fails to start

The YARN Registry DNS instance will fail to start if another process on the host is bound to port 53. Ensure no other services that are binding to port 53 are on the host where the YARN Registry DNS instance is deployed.

Class Loading Issue When Starting Solr

If you do not follow sequential steps during the upgrade, the Infra Solr instance may fail to start with the following exception:

null:org.apache.solr.common.SolrException: Error loading class
​'org.apache.solr.security.InfraRuleBasedAuthorizationPlugin'

If you see this exception, follow the steps in this HCC article to work around the issue:

https://community.hortonworks.com/content/supportkb/210579/error-nullorgapachesolrcommonsolrexception-error-l.html

Ambari Metrics System (AMS) does not start

When the Ambari Metrics System (AMS) does not start after upgrade, you can observe the following log snippet in the HBase Master:

master.HMaster: hbase:meta,,1.1588230740 is NOT online; state={1588230740 state=OPEN,
ts=1543610616273, server=regionserver1.domain.com,41213,1543389145213}; ServerCrashProcedures=true.
Master startup cannot progress, in holding-pattern until region onlined

The workaround is to manually clean up the znode from ZooKeeper.

  • If AMS mode = embedded, Remove the znode data from local filesystem path, e.g.:

     rm -f /var/lib/ambari-metrics-collector/hbase-tmp/zookeeper/zookeeper_0/version-2/*
    
  • If AMS mode = distributed, connect to the cluster zookeeper instance and delete the following node before restart:

     /usr/hdp/current/zookeeper-client/bin/zkCli.sh -server localhost:2181
    [zk: localhost:2181(CONNECTED) 0] rmr /ams-hbase-unsecure/meta-region-server