IOP to HDP Migration
Also available as:
PDF

Troubleshooting Migration Issues

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Issue: HDFS restart fails after Ambari upgrade on a NameNode HA-enabled, IOP cluster.

Cause:The package hadoop-hdfs-zkfc is not supported by this version of the stack-select tool.

Error: ZooKeeperFailoverController restart fails with the following errors
 shown in the Ambari operation log about hadoop-hdfs-zkfc is not a supported package.
File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/stack_select.py",
 line 109, in get_package_name package = get_packages(PACKAGE_SCOPE_STACK_SELECT, service_name, component_name)
File "/usr/lib/python2.6/site-packages/resource_management/libraries/functions/stack_select.py",
 line 234, in get_packages raise Fail("The package {0} is not supported by this version of the stack-select tool.".format(package))
resource_management.core.exceptions.Fail: The package hadoop-hdfs-zkfc is not supported by this version of the stack-select tool. 

Resolution:

On each ZKFailoverController:

  1. Edit /usr/bin/iop-select. Insert "hadoop-hdfs-zkfc": "hadoop-hdfs", after line 33.

    [Note]Note

    the "," is also part of the insert

  2. Run one the following command to create the symbolic link.

    If your cluster is IOP 4.2.5, run the following command:

    ln -sfn /usr/iop/4.2.5.0-0000/hadoop-hdfs /usr/iop/current/hadoop-hdfs-zkfc

    If your cluster is IOP 4.2.0, run the following command:

    ln -sfn /usr/iop/4.2.0.0/hadoop-hdfs /usr/iop/current/hadoop-hdfs-zkfc
  3. Restart ZKFailoverController via Ambari web UI.

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Issue: UI does not come up after migration to HDP

Error in logs:

Ambari-server log: Caused by: java.lang.RuntimeException: Trying to create a
ServiceComponent not recognized in stack info, clusterName=c1, serviceName=HBASE,
componentName=HBASE_REST_SERVER, stackInfo=HDP-2.6

Cause: HBASE_REST_SERVER component was not deleted before migration, as described here.

Resolution: Delete the HBASE_REST_SERVER component from the Ambari Server database, using the following steps:

DELETE FROM hostcomponentstate
WHERE service_name = 'HBASE'
  AND component_name = 'HBASE_REST_SERVER';

DELETE FROM hostcomponentdesiredstate
WHERE service_name = 'HBASE'
  AND component_name = 'HBASE_REST_SERVER';

DELETE FROM servicecomponent_version
WHERE component_id IN (SELECT
    id
  FROM servicecomponentdesiredstate
  WHERE service_name = 'HBASE'
  AND component_name = 'HBASE_REST_SERVER');

DELETE FROM servicecomponentdesiredstate
WHERE service_name = 'HBASE'
  AND component_name = 'HBASE_REST_SERVER';

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Issue: Ambari Metrics does not work after migration

Error in logs:

2017-08-03 22:07:25,524 INFO org.apache.zookeeper.ClientCnxn: Opening socket connection to server eyang-2.openstacklocal/172.26.111.18:61181. Will not attempt to authenticate using SASL (unknown error)
2017-08-03 22:07:25,524 WARN org.apache.zookeeper.ClientCnxn: Session 0x15daa1cd6780004 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
	at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
	at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
	at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:361)
	at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)

Cause: Ambari Metrics data is incompatible between IOP and HDP release.

Resolution: If AMS mode is embedded remove the transient ZooKeeper data on the collector node:

rm -rf /var/lib/ambari-metrics-collector/hbase-tmp/zookeeper/zookeeper_0/version-2/*

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Issue: Spark History server failed to start and displays the following message:

raise ExecutionFailed(err_msg, code, out, err)
resource_management.core.exceptions.ExecutionFailed: Execution of
'/usr/bin/kinit -kt /etc/security/keytabs/spark.headless.keytab
spark-hey@IBM.COM; ' returned 1. kinit: Key table file
'/etc/security/keytabs/spark.headless.keytab' not found while getting initial
credentials

If Kerberos is enabled, AND both SPARK and SPARK2 are installed AND SPARK and SPARK2 have the same service user names, then make sure the following properties are the same:

Config TypePropertyValue
spark-defaultsspark.history.kerberos.keytab/etc/security/keytabs/spark2.headless.keytab
spark-defaultsspark.history.kerberos.keytab/etc/security/keytabs/spark2.headless.keytab

This can be addressed during add service wizard as well by setting the correct value on the Configure Identities section:

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Issue: Oozie start fails after upgrade from IOP 4.2.0.

If oozie server fails with following message in catalina.out log file:

Error in logs:

org.apache.jasper.compiler.JDTCompiler$1 findType
SEVERE: Compilation error
org.eclipse.jdt.internal.compiler.classfmt.ClassFormatException
   at org.eclipse.jdt.internal.compiler.classfmt.ClassFileReader.<init>(ClassFileReader.java:372)
   at org.apache.jasper.compiler.JDTCompiler$1.findType(JDTCompiler.java:206) 

Cause:This is because the IOP bigtop-tomcat version is older than what is required by Oozie in HDP 2.6.4. The Express Upgrade process did not upgrade it to the specific minor version required because it is not possible to do a side-by-side install of this dependency.

Resolution:

Fix this problem by upgrading to big-tomcat 6.0.48-1

yum upgrade bigtop-tomcat

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Issue: Ambari upgrade results in db consistency check warning message

After ambari server upgrade, it is quite likely that a warning message would be thrown during start operation regarding database consistency check, example:

2017-08-15 07:46:18,903  INFO - Checking for configs that are not mapped to any service
2017-08-15 07:46:18,964  WARN - You have config(s):
wd-hiveserver2-config-version1500918446970,spark-metrics-properties-version1493664394129,spark
-javaopts-properties-version1493664394129,spark-env-version1493752183508 that is(are)
not mapped (in serviceconfigmapping table) to any service!

This indicates that there were service(s) that were deleted from Ambari and the orphaned config associations exist in the database. These do not affect the cluster operation and therefore are listed as Warnings.

This message can be safely ignored with --skip-database-check or you can clear these warnings by following the steps mentioned in the description of this Apache Jira:

https://issues.apache.org/jira/browse/AMBARI-20875

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Issue: Ambari server log shows consistency warning after service delete

After deleting a service and restarting ambari server, Ambari server log shows warning messages if there exists any ConfigGroup(s) for that service which are not deleted by Ambari.

ERROR [ambari-hearbeat-monitor] HostImpl:1085 - Config inconsistency exists: unknown configType=solr-site

These messages are benign and can be safely ignored. However, the problem of the logs filling up with these messages remains.

Cause: https://issues.apache.org/jira/browse/AMBARI-21784, can be used to track the fix for this issue.

Resolution: This warning can be fixed by deleting ConfigGroups that belong to the deleted services using Ambari API(s).

Get all ConfigGroups to delete for service (tag = service-name). This GET call can be performed from the browser:

http://<ambari-server-host>:<port>/api/v1/clusters/<cluster-name>/config_groups?ConfigGroup/tag=<service-name>&fields=ConfigGroup/id

Delete the ConfigGroups using the following delete API call (Use the <id> of the ConfigGroup(s) obtained from the previous call):

curl -u <ambari-admin-username>:<ambari-admin-password> -H "X-Requested-By:ambari" -i
-X DELETE
http://<ambari-server-host>:<port>/api/v1/clusters/<cluster-name>/config_groups/<id> 

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Issue: Kerberos cluster - Hive service check failed post migration

This is not directly related to either the Ambari or the stack upgrade but is related to the local accounts on the hosts of the cluster.

The Hive service check will fail with an impersonation issue if the local ambari-qa user is not part of the expected group; which, by default is “users”. The expected groups can be seen by viewing the value of the core-site/hadoop.proxyuser.HTTP.groups in the HDFS configurations or via Ambari’s REST API.

Error in logs:

The error seen in the STDERR of the service check operation will be as follows:

resource_management.core.exceptions.ExecutionFailed:
 Execution of '/var/lib/ambari-agent/tmp/templetonSmoke.sh c6402.ambari.apache.org ambari-qa 50111
 idtest.ambari-qa.1503419582.05.pig /etc/security/keytabs/smokeuser.headless.keytab true
 /usr/bin/kinit ambari-qa-c1@EXAMPLE.COM /var/lib/ambari-agent/tmp' returned 1. Templeton Smoke Test (ddl cmd): Failed.
 : {"error":"java.lang.reflect.UndeclaredThrowableException"}http_code <500>

Looking at the /var/log/hive/hivemetastore.log file, the following error can be seen:

2017-08-22 16:33:03,183 ERROR [pool-7-thread-54]: metastore.RetryingHMSHandler
(RetryingHMSHandler.java:invokeInternal(203)) - MetaException(message:User:
HTTP/c6402.ambari.apache.org@EXAMPLE.COM is not allowed to impersonate ambari-qa)

Resolution:

To fix the issue, either:

  • Add the ambari-qa user to an expected group

    Example: usermod -a -G users ambari-qa

    Example:

    [root@c6402 hive]# groups ambari-qa
    ambari-qa : hadoop
    [root@c6402 hive]# usermod -a -G users ambari-qa
    [root@c6402 hive]# groups ambari-qa
    ambari-qa : hadoop users 

    OR

  • Add one (or more) of the ambari-qa groups to the HDFS configuration at:

    core-site/hadoop.proxyuser.HTTP.groups

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Issue: Only post-upgrade Spark jobs display in Spark History Server.

Cause: IOP's default spark.eventLog.dir has been changed to a custom value that includes the string /iop/apps but does not match the default value (/iop/apps/4.2.0.0/spark/logs/history-server).

An example custom value would be /iop/apps/custom/dir.In this case, that custom value will be changed to /hdp/apps/custom/dir during the stack upgrade.

Resolution:

To see pre-upgrade Spark jobs, change the setting back to the custom value established before upgrading, in this case: /iop/apps/custom/dir

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Issue: Oozie Hive job failed with NoClassDefFoundError

If Oozie HA is enabled and the Oozie Hive job fails with the following error in the yarn application log:

<<< Invocation of Main class completed <<<

Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.HiveMain], main() threw exception, org/apache/hadoop/hive/shims/ShimLoader
java.lang.NoClassDefFoundError: org/apache/hadoop/hive/shims/ShimLoader
	at org.apache.hadoop.hive.conf.HiveConf$ConfVars.<clinit>(HiveConf.java:400)
	at org.apache.hadoop.hive.conf.HiveConf.<clinit>(HiveConf.java:109)
	at sun.misc.Unsafe.ensureClassInitialized(Native Method) 

Resolution:

  • Run the following command as the oozie user:

    oozie admin -oozie http://<oozie-server-host>:11000/oozie -sharelibupdate 
  • Rerun the oozie hive job to verify.

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Issue: Hive Server Interactive (HSI) fails to start in a kerberized cluster.

HSI is not present in IOP clusters, adding Hive and/or enabling HSI after migrating a kerberized cluser to HDP may result in HSI failing to start.

Cause: HSI conditional service logic not present in IOP cluster results in Keytabs not created in YARN/kerberos.json and therefore keytabs do not exist on all NameNodes.

Resolution: Manually regenerate keytabs from Ambari before enabling HSI, so that keytabs are distributed across all Node Managers.