Apache Ambari Major Upgrade
Also available as:
PDF

Upgrade Troubleshooting

In the event of a problem, contacting Hortonworks Support is highly recommended. Alternatively, you can perform these troubleshooting procedures.

Restoring the Hive Metastore

  1. On the node where the database for Hive Metastore resides, create databases you want to restore. For example:

    $ mysql -u hiveuser -p -e "create database <hive_db_schema_name>;"

  2. Restore each Metastore database from the dump you created. For example:

    $ mysql -u hiveuser -p <hive_db_schema_name> < </path/to/dump_file>

  3. Reconfigure Hive Metastore if necessary. Reconfiguration might be necessary if the upgrade fails. Contacting Hortonworks Support for help with reconfiguration is recommended. Alternatively, in HDP 3.x, set key=value commands on the command line to configure Hive Metastore.

Solving Problems Using Spark and Hive

Use the Hive Warehouse Connector (HWC) and low-latency analytical processing (LLAP) to access Spark data after upgrading. HWC is a Spark library/plugin that is launched with the Spark app. HWC and LLAP are required for certain tasks, as shown in the following table:

Table 4.2. Spark Compatibility

TasksHWC RequiredLLAP RequiredOther Requirement/Comments
Read Hive managed tables from SparkYesYesRanger ACLs enforced.
Write Hive managed tables from SparkYesNoRanger ACLs enforced.
Read Hive external tables from SparkNoOnly if HWC is usedTable must be defined in Spark catalog. Ranger ACLs not enforced.
Write Hive external tables from SparkNoNoRanger ACLs enforced.
Read Spark tablesYesYesRanger ACLs enforced.
Write Spark tablesYesNoRanger ACLs enforced.


Spark-submit and pyspark are supported. The spark thrift server is not supported.

Accessing Hive tables using SparkSQL

To access tables, which were converted to ACID tables during the upgrade, using SparkSQL, you create a new external table using Hive 3 and migrate the data from the managed to the new table.

  1. Rename the managed table to *_old.

  2. Migrate data from *_old to <new> external table using the original name in the historical or the default location (/warehouse/tablespace/external/hive/<?>.db/<tablename>).

    CREATE EXTERNAL TABLE new_t AS SELECT * FROM old_t;

Recovering missing Hive tables

  1. Login as Superuser HDFS. For example:

    $ sudo su - hdfs

  2. Read the snapshots on HDFS that backup your table data. For example:

    $ hdfs dfs -cat /apps/hive/warehouse/.snapshot/s20181204-164645.898/students/000000_0

    Example output for a trivial table having two rows and three columns might look something like this:

    fred flintstone351.28

    barney rubble322.32

  3. In Hive, insert the data into the table if the schema exists in the Hive warehouse; otherwise, restore the Hive Metastore, which includes the schemas, from the database dump you created in the pre-upgrade process.

YARN Registry DNS instance fails to start

The YARN Registry DNS instance will fail to start if another process on the host is bound to port 53. Ensure no other services that are binding to port 53 are on the host where the YARN Registry DNS instance is deployed.

Class Loading Issue When Starting Solr

If you do not follow sequential steps during the upgrade, the Infra Solr instance may fail to start with the following exception:

null:org.apache.solr.common.SolrException: Error loading class
​'org.apache.solr.security.InfraRuleBasedAuthorizationPlugin'

If you see this exception, follow the steps in this HCC article to work around the issue:

https://community.hortonworks.com/content/supportkb/210579/error-nullorgapachesolrcommonsolrexception-error-l.html

Ambari Metrics System (AMS) does not start

When the Ambari Metrics System (AMS) does not start after upgrade, you can observe the following log snippet in the HBase Master:

master.HMaster: hbase:meta,,1.1588230740 is NOT online; state={1588230740 state=OPEN,
ts=1543610616273, server=regionserver1.domain.com,41213,1543389145213}; ServerCrashProcedures=true.
Master startup cannot progress, in holding-pattern until region onlined

The workaround is to manually clean up the znode from ZooKeeper.

  • If AMS mode = embedded, Remove the znode data from local filesystem path, e.g.:

     rm -f /var/lib/ambari-metrics-collector/hbase-tmp/zookeeper/zookeeper_0/version-2/*
    
  • If AMS mode = distributed, connect to the cluster zookeeper instance and delete the following node before restart:

     /usr/hdp/current/zookeeper-client/bin/zkCli.sh -server localhost:2181
    [zk: localhost:2181(CONNECTED) 0] rmr /ams-hbase-unsecure/meta-region-server