5. Known Issues

Ambari 2.4 has the following known issues, scheduled for resolution in a future release. Also, refer to the Ambari Troubleshooting Guide for additional information.

Table 1.6. Ambari 2.4 Known Issues

Apache Jira

HWX Jira

Problem

Solution

AMBARI-18296

BUG- 65357

Ambari Server start fails due to Database Consistency Check throwing NullPointerException:

2016-08-31 21:52:28,082 INFO - ******************************* Check database started 
2016-08-31 21:52:31,647 INFO - Checking for configs not mapped to any cluster
2016-08-31 21:52:31,653 INFO - Checking for configs selected more than once 
2016-08-31 21:52:31,655 INFO - Checking for hosts without state
2016-08-31 21:52:31,657 INFO - Checking host component states count equals host component desired states count 
2016-08-31 21:52:31,660 INFO - Checking services and their configs 
2016-08-31 21:52:33,669 ERROR - Unexpected error, database check failed java.lang.NullPointerException at
 org.apache.ambari.server.checks.DatabaseConsistencyCheckHelper.checkServiceConfigs(DatabaseConsistencyCheckHelper.java:543)
 at org.apache.ambari.server.checks.DatabaseConsistencyChecker.main(DatabaseConsistencyChecker.java:115)

To start Ambari and avoid this exception, you can start without performing the database consistency check.

ambari-server start --skip-database-check

See Start the Ambari Server for more information about the consistency check.

N/A

BUG-65306

On RHEL/CentOS/Oracle Linux 7.1+, the ambari-server stop command does not fully stop the Ambari Server process.

Use ps aux to identify the process id for the Ambari Server process, and kill the process using the kill command.

N/A

BUG-65248

When upgrading from Ambari 2.4.0.0 to 2.4.0.1, the ambari.properties file is overwritten.

When upgrading from Ambari 2.4.0.0 to 2.4.0.1, the ambari.properties file is overwritten. You must make a safe copy of the Ambari Server configuration file found at /etc/ambari-server/conf/ambari.properties.

N/A

BUG-65043

After upgrading a secure cluster from HDP-2.4.x to HDP-2.5.x, the Kafka service has incorrect security properties.

After upgrading from HDP 2.4x to 2.5x (and after removing and then replacing Atlas, as recommended); manually update the user-defined Kerberos descriptor, as follows:

  1. Retrieve the user-defined Kerberos descriptor.

    curl -u admin:admin -o kerberos_descriptor.json -X GET http://localhost:8080/api/v1/clusters/c1/artifacts/kerberos_descriptor
  2. Remove the following block (or similar) from the top of the retrieved Kerberos descriptor.

    "href" : "http://localhost:8080/api/v1/clusters/c1/artifacts/kerberos_descriptor",
    "Artifacts" : {
        "artifact_name" : "kerberos_descriptor",
        "cluster_name" : "c1"
      },

  3. Alter or remove the Atlas-related block from the retrieved Kerberos descriptor. For example, the block that contains:

    {
    ...
    "name": "ATLAS"
    ...
    }

    If no changes have been made to the default values from the stack-level Kerberos descriptor, simply remove the Atlas-specific block.

  4. Update the user-defined Kerberos descriptor.

    1. curl -H "X-Requested-By:ambari" -u admin:admin -X PUT -d @./kerberos_descriptor.json http://localhost:8080/api/v1/clusters/c1/artifacts/kerberos_descriptor

  5. Restart Ambari, if necessary.

  6. Use Regenerate Keytabs to create the missing configurations, principals, and keytab files.

N/A

BUG-64966

When performing a rolling upgrade of a secure, HDP 2.4 cluster running Atlas on SLES11 sp3, the Atlas service fails to upgrade. You may see the following error:

resource_management.core.exceptions.Fail: Execution of 'ambari-sudo.sh /usr/bin/hdp-select set all
 `ambari-python-wrap /usr/bin/hdp-select versions | grep ^2.5.0.0-1238 | tail -1`' returned 1.
 symlink target /usr/hdp/current/atlas-client for atlas already exists and it is not a symlink.

For all environments, we recommend removing Atlas from the cluster before a performing a rolling upgrade, as described here.

Specific upgrade steps for a secure HDP 2.4 cluster running Atlas on SLES 11 sp3:

  1. Delete the Atlas service via the Ambari API2.

  2. Upgrade Ambari to version 2.4.03.

  3. Upgrade Ambari to version 2.4.04.

  4. Install HDP-2.5.x bits.

  5. Perform an Express Upgrade.

  6. Upgrade the stack.

  7. Reinstall Atlas.

  8. Restart all impacted services using Services > Restart All.

N/A

BUG-64959

If you have Storm in your cluster and have Kerberos enabled, after Ambari 2.4 upgrade, the Storm summary page in Ambari Web does not show information and exceptions are logged.

24 Aug 2016 14:19:38,107 ERROR [ambari-metrics-retrieval-service-thread-2738] MetricsRetrievalService:421
 - Unable to retrieve metrics from http://hcube1-1n02.eng.hortonworks.com:8744/api/v1/cluster/summary.
 Subsequent failures will be suppressed from the log for 20 minutes.
java.io.IOException: Server returned HTTP response code: 500 for URL: 
http://hcube1-1n02.eng.hortonworks.com:8744/api/v1/cluster/summary
 at sun.reflect.GeneratedConstructorAccessor228.newInstance(Unknown Source)
 at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)

You need to add the Ambari Server principal name to the Storm nimbus.admins property in Ambari Web > Services > Storm > Configs.

N/A

BUG-64947

Filtering by Role on Users page in Ambari Admin Interface is slow with 1000+ users.

If you have 1000+ users from LDAP synchronized into Ambari and you attempt to filtering by Role on the Users page in Ambari Admin Interface, the results will be slow (potential 20 seconds). Reducing the number of users you have synchronized into Ambari is a way to improve this performance.

N/A

BUG-64912

Apache Oozie requires a restart after an Atlas configuration update, but may not be included in the services marked as requiring restart in Ambari.

Select Oozie > Service Actions > Restart All to restart Oozie along with the other services.
N/A

BUG-64896

Upon adding Atlas to an HDP-2.5 cluster that is kerberized it reports error while trying to communicate with Kafka. You may see the following in the error :

"Could not login: the client is being asked for a password, but the Kafka client code
 does not currently support obtaining a password from the user.
Not available to garner authentication information from the user"

This is because the keytab and principal details may be missing from the Atlas config.

Add the following config properties to Atlas configuration type application-properties

Set atlas.jaas.KafkaClient.option.keyTab to the Atlas Service keytag

e.g. /etc/security/keytabs/atlas.service.keytabSet atlas.jaas.KafkaClient.option.principal to the principal name in the above keytab

e.g. atlas/atlas_server_host_name@EXAMPLE.COM

Then, restart Atlas Server.

N/A

BUG-64819

After performing a Rolling Upgrade, the “Percent JournalNodes Available” alert goes CRITICAL.

After performing a Rolling Upgrade, the “Percent JournalNodes Available” alert goes critical even though all the JournalNodes are started and healthy. You must disable and re-enable this alert to clear the CRITICAL status. In Ambari Web, browse to Alerts and find the “Percent JournalNodes Available” alert. Click to Disable the alert. Once disabled completes, re-enable and the alert status should be OK.

AMBARI-18177

BUG-64326

If you add or remove ZooKeeper servers and you have Atlas in your cluster, you must update some Atlas properties.

If you are running Atlas in your cluster and make changes to ZooKeeper (adding or removing ZooKeeper server components), you must update the Atlas properties list below to make sure they reflect the list of ZooKeeper servers in the cluster. You can modify the following properties from Ambari Web > Services > Atlas > Configs:

atlas.audit.hbase.zookeeper.quorum
Sample: node-test0.docker.nxa.io:2181,node-test2.docker.nxa.io:2181,node-test1.docker.nxa.io:2181

atlas.graph.index.search.solr.zookeeper-url
Sample: node-test1.docker.nxa.io:2181/infra-solr,node-test0.docker.nxa.io:2181/infra-solr,node-test2.docker.nxa.io:2181/infra-solr

atlas.graph.storage.hostname
Sample: node-test0.docker.nxa.io,node-test2.docker.nxa.io,node-test1.docker.nxa.io

atlas.kafka.zookeeper.connect
Sample: node-test0.docker.nxa.io,node-test2.docker.nxa.io,node-test1.docker.nxa.io 
N/A

BUG-62609

When upgrading Ambari 2.x to 2.4 on a secure, cluster running Storm on Ubuntu or SLES, the Storm service check fails to restart Nimbus server due to a randomized StormClient principal.

You must manually reset the original StormClient principal on the kerberos AD server.

N/A

BUG-57093

When toggling and saving the Hive service's "Enable Interactive Query" configuration, it's important to wait for the background operations associated with that change to complete before trying to toggle it once more. If you don't wait for background operations to complete and try to re-enable this feature it can cause some of the operations to fail resulting in configuration inconsistencies.

If such inconsistencies are observed then retry toggling and saving of the "Enabling Interactive Query" configuration and wait for all requests (background operations) to complete.

AMBARI-12436

BUG-40481

Falcon Service Check may fail when performing Rolling Upgrade or downgrade, with the following error:

2015-06-25 18:09:07,235 ERROR - [main:]
 ~ Failed to start ActiveMQ JMS Message Broker.
 Reason: java.io.IOException: Invalid location: 1:6763311, :
 java.lang.NegativeArraySizeException (BrokerService:528) 
 java.io.IOException: Invalid location: 1:6763311, :
 java.lang.NegativeArraySizeException
 at
 org.apache.kahadb.journal.DataFileAccessor.readRecord(DataFileAccessor.java:94)

This condition is rare.

When performing a Rolling Upgrade from HDP 2.2 to HDP 2.3 (or upgrade > dowgrade > upgrade) and Falcon Service Check fails with the above error, browse to the Falcon ActiveMQ data dir (specified in falcon properties file), remove the corrupted queues, and stop and start the Falcon Server.

cd <ACTIVEMQ_DATA_DIR>
rm -rf ./localhost
cd /usr/hdp/current/falcon-server 
su -l <FALCON_USER> 
./bin/falcon-stop
./bin/falcon-start
AMBARI-12283

BUG-40300

After adding or deleting ZooKeeper Servers to an existing cluster, Service Check fails.

After adding or deleting ZooKeeper Servers to an existing cluster, Service Check fails due to conflicting zk ids. Restart ZooKeeper service to clear the ids.

N/A

BUG-40694

The Slider view is not supported on a cluster with SSL (wire encryption) enabled. Only use the Slider view on clusters without wire encryption enabled. If it is required to run Slider on a cluster with wire encryption enabled, please contact Hortonworks support for further help.