Release Notes
Also available as:
PDF

Known Issues

Hortonworks Bug ID

Apache JIRA

Apache Component

Summary

BUG-38148

ACCUMULO-4389

Accumulo

Description of Problem: Apache Accumulo has a feature called "Replication" which automatically propagates updates to one table to a list of other Accumulo cluster instances. This feature is used for disaster-recovery scenarios allowing data-center level failover. With this replication feature, there are a number of client API methods which support developer interactions with the feature.

The ReplicationOperations#drain(String, Set) method is intended to serve as a blocking call which waits for all of the provided write-ahead log files that need to be replicated to other peers. Sometimes, the method reportedly does not actually wait for a sufficient amount of time.

Associated error message: No direct error message is generated; the primary symptom is when the configured Accumulo replication peers do not have all of the expected data from the source Accumulo cluster.

Workaround: None at this time.

Upstream fix: https://issues.apache.org/jira/browse/ACCUMULO-4389 has been opened to track this issue.

BUG-40773N/AKafka

Description of Problem: Kafka broker fails to start after disabling Kerberos security.

Workaround: Before disabling Kerberos, you need to stop Kafka brokers.

  1. Run the following command as the Kafka user:

    ./bin/kafka-run-class.sh kafka.admin.ZkSecurityMigrator --zookeeper.acl unsecure --zookeeper.connect 'hostname:2181'

  2. Follow the instructions for disabling Kerberos through Ambari.

  3. Restart Kafka nodes.

BUG-42784N/A

Phoenix

sqlline Shell for Phoenix Truncates Table Columns of Longer Rows in a Terminal

If sqlline runs in a terminal that has insufficient width to display all of the columns in the SQL row, the additional columns are truncated to the maximum terminal width. In this case, resize the terminal width and restart sqlline. Alternatively, you can enable XML Output Format.

BUG-51100

SPARK-12516

Spark

Component Affected: Dynamic allocation enabled on YARN

Description of Problem: Due to lack of blacklist mechanism, Spark can still schedule tasks on bad nodes where external shuffle has already occurred. This will lead to job failure and driver exit.

Workaround: Avoid configuring spark.yarn.max.executor.failures 3 to a static number. Spark itself will figure out a reasonable failure number whether dynamic allocation is enabled or not.

BUG-52223

HIVE-13014

Hive

Component Affected: Modules using Remote metastore such as Hive CLI and Streaming Ingest API in HCatalog.

Description of Problem: In rare circumstances (due to network issues such as temporary partitions, lost messages, etc.), it is possible for a insert/update/delete operation on a transactional table to report a failure, when it actually committed successfully.

Workaround: Embedded metastore (used by HiveServer2) is unaffected by this behavior.

BUG-55799HIVE-12930Hive

Description of Problem: SSL shuffle for LLAP is not supported

Workaround: Currently, there is no workaround.

BUG-58308

PHOENIX-2067 and PHOENIX-2120

Phoenix

After an upgrade from an earlier HDP 2.x version to HDP 2.5, the rows of a table that has any of the following column types are not ordered correctly:

  • VARCHAR DESC columns

  • DECIMAL DESC columns

  • ARRAY DESC columns

  • Nullable DESC columns that are indexed (affects the index, but not the data table)

  • BINARY columns included in the primary key constraint

This is a result of a column sort-order issue in Apache Phoenix. You can resolve this issue by upgrading the affected tables as described in Phoenix-4.5.0 Release Notes.

BUG-59714HIVE-13974

Hive

Description of Problem: ORC Schema Evolution does not support adding columns to a STRUCT type column unless the STRUCT column is the last column.

You can add column C to the last column last_struct:

CREATE TABLE orc_last_struct (
str STRING,
last_struct STRUCT<A:STRING,B:STRING>
) STORED AS ORC;

ALTER TABLE orc_last_struct REPLACE columns (str STRING, last_struct
STRUCT<A:STRING,B:STRING,C:BIGINT>);

You will be able to read the table.

However, in this table:

CREATE TABLE orc_inner_struct (
str STRING,
inner_struct STRUCT<A:STRING,B:STRING>,
last DATE
) STORED AS ORC;

ALTER TABLE orc_inner_struct REPLACE columns (str STRING, inner_struct
STRUCT<A:STRING,B:STRING,C:BIGINT>, last DATE);

You will not be able to read the table. You will get execution errors like: java.lang.ArrayIndexOutOfBoundsException.

Workaround: The workaround is not to use tables with Schema Evolution in inner STRUCT type columns.

BUG-60690KNOX-718Knox

Description of Problem: Unable to log in using Knox SSO even when providing correct credentials. This is because the whitelist is not correctly configured. The login page will not provide an error message to indicate a reason for the failed login.

Associated error message: Found in <log_directory_for_knox>/gateway.log

Workaround: In knoxsso.xml, modify the value of the knoxsso.redirect.whitelist.regex parameter to reflect the configuration of your environment.

BUG-61739N/AKnox

Description of Problem: Defined configurations can exist in two places but cause problems if they do not match.

Typically when using Ambari, you do not need to define the same configuration in multiple places. However, in some cases, defining the configuration multiple times is necessary.

Workaround: Typically, HiveServer2 is setup in HA mode, which uses ZooKeeper. You can now use Knox to access HiveServer2 in HA mode by using the following configuration in a Knox topology file.

<provider>
    <role>ha</role>
    <name>HaProvider</name>
    <enabled>true</enabled>
    <param>
        <name>HIVE</name>
        <value>maxFailoverAttempts=3;failoverSleep=1000;enabled=true;zookeeperEnsemble=machine1:2181,machine2:2181,machine3:2181;
      zookeeperNamespace=hiveserver2</value>
  </param>
</provider>

In the above configuration, you must define the value of the zookeeperEnsemble property with the same value as the hive.zookeeper.quorum parameter in the Hive configuration file hive-site.xml.

If the value of hive.zookeeper.quorum changes, you will need to manually change the value of zookeeperEnsemble to match the change.

BUG-62423N/AFalconFor HDP 2.5.0, Falcon binaries ship with zookeeper-3.4.6.2.4.3.0-126-tests.jar.
BUG-62588N/AStorm

Description of Problem: Storm does not support rolling upgrade from a previous HDP version to HDP-2.5.

Solution: If your cluster is managed by Ambari, the rolling upgrade process will ask you to stop Storm topologies, perform the upgrade, and redeploy your topologies.

If your cluster is not managed by Ambari, perform the following manual upgrade steps for Storm before starting the rolling upgrade process:

  1. Stop all topologies.

  2. Stop all storm daemons.

  3. Delete storm.local.dir contents on all nodes.

  4. Delete storm.zookeeper.root node.

Next, upgrade the cluster to HDP-2.5.

To finish the Storm upgrade process: start the storm daemons, and then redeploy the topologies.

BUG-62662HADOOP-13382Hadoop Common

Description of Problem: This backward-incompatible change was done to remove a CVE vulnerability due to commons-httpclient-3.1.

Workaround: Projects or Java jobs with undeclared transitive dependencies on commons-httpclient, previously provided via hadoop-common or hadoop-client, will have to either stop using commons-httpclient (recommended), or import it as an explicit dependency.

BUG-63132N/AStorm

Summary: Solr bolt does not run in a Kerberos environment.

Associated error message: The following is an example: [ERROR] Request to collection hadoop_logs failed due to (401) org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http:[...] Error 401 Authentication required

Workaround: None at this time.

BUG-63165PHOENIX-3126Zeppelin

Description of problem: When Kerberos is enabled in the cluster, Kerberos-based user authentication in the Zeppelin UI is not correctly passed to Phoenix/HBase. The user credentials will be unavailable to Phoenix, resulting in standard HBase authentication/authorization schemes working as intended.

Associated error message: Unexpected failed authentication and authorization messages from Zeppelin in talking to Phoenix/HBase.

Workaround: There is no known workaround at this time. This issue will be addressed in a future maintenance release.

BUG-63226

OOZIE-1983

Oozie

Component Affected: Spark shared library

Description of Problem: In HDP 2.5, after you convert an insecure cluster to a secure cluster, or vice versa, the Oozie Spark share library does not update with the required secure parameters.

Workaround: To update, run the following command as "oozie" user.

/usr/hdp/current/oozie-client/bin/oozie-setup.sh sharelib create -fs hdfs://<HDFS FQDN>:8020 -locallib /usr/hdp/current/oozie-client/oozie-sharelib.tar.gz

When the command completes, restart Oozie from the Ambari web UI.

  • In Ambari, select Oozie from the services list.

  • On the Oozie Summary page, click Service Actions > Restart All.

BUG-63833AMBARI-18096Spark and other components that process long-running jobs.

Description of Problem: Long running jobs (such as Spark Streaming jobs) do not work in secure clusters when transparent data encryption is enabled.

Workaround: None at this time.

BUG-63885HIVE-14446 Hive, Hive2

Component Affected: ACID

Description of Problem: Small tables estimated to have about 300 million rows that broadcast to a Mapjoin will cause the BloomFilter to overflow. Typically, this is due to bad stats estimation.

Workaround: It is possible to avoid this issue with the following:

set hive.mapjoin.hybridgrace.hashtable=false

However, if this is caused by bad stats estimation and Hybrid grace hash join does not work, the regular mapjoin also will not work.

BUG-64028N/ARanger

Component Affected: Create Policy Audit

Description of Problem: When attempting to view the details of a Audit record associated with a deleted ranger repository, the admin UI shows Page Not Found Error Page (401).

Workaround: Currently, there is no workaround for this. This will be addressed in a future release.

BUG-64033

RANGER-1143

Rangertagsync policy exist but still authorization is failing (see Workaround for BUG-64033).
BUG-64084N/AAtlas, Storm

Description of Problem: Hive topology fails when the hive-site.xml contains an Atlas hook that tries to register any new tables/partitions created through the hcatalog streaming API.

Currently, the use case that will cause this failure is copying the hive-site.xml from the target cluster to your topology codebase and packaging/creating an uber jar.

Associated Error Message: Since the Atlas hook and its configuration is not getting packaged with Storm Topology jar, the result is NoClassDefFoundError.

Workaround: After copying the hive-site.xml to your topology code, delete the Atlas hook configuration references and than package the jar.

BUG-64098N/ASpark

Description of Problem: When installing Spark manually on Debian/Ubuntu, the apt-get install spark command does not install all Spark packages.

Workaround: Use the -t option in your apt-get install command: apt-get install -t HDP spark

BUG-64385

N/A

Falcon, Hive2

Description of Problem: Using HSI for writes and HiveDR are incompatible. This affects all users of HSI (hive2 hiveserver2-interactive) and HiveDR(hive replication). The resolution of BUG-64385 introduces a side effect. If you use HSI to write to warehouse, your changes will not replicate, which can lead to missed updates.

Workaround: If you want to use Replication, use HSI for reads only.

BUG-64511HDFS-9618HDFS

Component Affected: HDFS log

Description of Problem: When the log level is set to INFO, unnecessary DEBUG log messages are generated in namenode and, as a result, namenode performance is degraded.

Workaround: A workaround is to set log level to higher than INFO, such as WARN, so that the unnecessary DEBUG messages will not be generated. As a side effect, this workaround will prevent INFO log messages from printing to the log.

The bug is soon to be addressed by HDFS-9618, which is a very simple log message fix that changes the log level check from INFO to DEBUG since the log message is printed in DEBUG.

BUG-64965N/AZeppelin

Component Affected: Zeppelin UI

Description of Problem: When Zeppelin SSL is enabled, the Zeppelin UI is unavailable through Safari due to a WebSocket network error:

WebSocket network error: OSStatus Error -9807: Invalid certificate chain

Workaround: This occurs due to the use of self signed certificates. Self-signed certificates require OS or Browser specific steps that you must follow prior to use in production. In production, use Certificate Authority signed certificate to prevent this error from occurring.

BUG-65028N/AZeppelin

Description of Problem: On secure clusters that run Zeppelin, configure settings to limit interpreter editing privileges to admin roles.

Workaround: Add the following lines to the [urls] section of the Zeppelin shiro.ini configuration file. For a cluster not managed by Ambari, add the lines to /etc/zeppelin/conf/shiro.ini.

/api/interpreter/** = authc, roles[admin]

/api/configurations/** = authc, roles[admin]

/api/credential/** = authc, roles[admin]
BUG-65043N/AAmbari, Atlas

Description of Problem: When upgrading a secure cluster from HDP-2.4.x to HDP-2.5.x, the Kafka service has incorrect security properties.

Workaround: After upgrading from HDP 2.4x to 2.5x (and after removing and then replacing Atlas, as recommended); manually update the user-defined Kerberos descriptor to use the 2.5 stack default values (if acceptable). If default values are unacceptable, copy Kerbeos descriptor from the stack, make required changes, and then replace the descriptor. Then, use Regenerate Keytabs to create the missing configurations, principals, and keytab files.

BUG-65058N/A Ambari, Hive

Description of Problem: LLAP containers may end up getting killed due to insufficient memory available in the system.

Assocaited Error Message: The following messages in the AM log of LLAP YARN Application.

# There is insufficient memory for the Java Runtime Environment to continue.
# Native memory allocation (mmap) failed to map 194347270144 bytes for committing reserved memory.
# An error report file with more information is saved as:

Workaround: Reduce the YARN NodeManager available memory. This is defind as the Memory allocated for all YARN containers on a node under the YARN Configuration tab.

Desription of Problem: LLAP daemons can be killed by the YARN Memory Monitor

Associated Error Message: The following messages in the AM log of LLAP YARN Application.

is running beyond physical memory limits. Current usage: <USED> of <ALLOCATED> GB physical memory used

Workaround: Lower the LLAP heap size under the Advanced hive-interactive-env section of the Advanced Hive config.

[Note]Note

You will need to change this value each time any configs are changed under the Hive Interactive section on the main Hive Config page.

BUG-65080N/AAmbari, Atlas

Description of Problem: Atlas web UI produces inaccessible alert after adding Atlas service on upgraded cluster

Workaround:

  1. Stop Atlas Server.

  2. Copy solr xml files to correct config folder and chown as $atlas_user:$hadoop_group:

    cp -R /usr/hdp/2.5.0.0-1245/etc/atlas/conf.dist/solr/* /etc/atlas/conf/solr/
    cp: overwrite `/etc/atlas/conf/solr/solrconfig.xml'? n
    chown atlas:hadoop /etc/atlas/conf/solr/*
    
    cp /usr/hdp/2.5.0.0-1245/etc/atlas/conf.dist/user-credentials.properties /etc/atlas/conf/
    cp /usr/hdp/2.5.0.0-1245/etc/atlas/conf.dist/policy-store.txt /etc/atlas/conf/
    
    chown atlas:hadoop /etc/atlas/conf/users-credentials.properties
    chown atlas:hadoop /etc/atlas/conf/policy-store.txt
    
  3. Delete zookeeper znode:

    # kinit -kt /etc/security/keytabs/atlas.service.keytab  atlas/<HOST>@<DOMAIN>
    # cd /usr/hdp/current/zookeeper-client/bin/ 
    # ./zkCli.sh -server <zookeepernode>:<zookeeperport>
    [ ...... (CONNECTED) ] rmr  /infra-solr/configs/atlas_configs
    
  4. Ensure Atlas application-properties are present:

    atlas.jaas.KafkaClient.option.keyTab = /etc/security/keytabs/atlas.service.keytab
    atlas.jaas.KafkaClient.option.principal = atlas/_HOST@EXAMPLE.COM
  5. Start Atlas.

BUG-65286

ATLAS-1147

Atlas

Component Affected: Atlas UI

Description of Problem: On the Schema tab, the Name column is missing.

Workaround: Install the available patch:

  1. Go to ATLAS-1147.

  2. Download SchemaLayoutView.js.

  3. Copy SchemaLayoutView.js in the hosts that run Atlas server to: /usr/hdp/current/atlas-server/server/webapp/atlas/js/views/schema

  4. Refresh the browser to update the browser cache.

BUG-66325, BUG-66326N/AZeppelin

Description of Problem: Zeppelin (with or without Livy) cannot access data on encrypted (TDE) clusters when the default user settings are in effect.

Workaround:

  1. Add the following proxy users to the Ranger KMS configuration, replacing 'livy' and 'yarn' with the actual configured service user names for Livy and YARN, if they differ from the default service users livy and yarn on your cluster:

    hadoop.kms.proxyuser.livy.groups=* hadoop.kms.proxyuser.livy.hosts=* hadoop.kms.proxyuser.livy.users=* hadoop.kms.proxyuser.yarn.groups=* hadoop.kms.proxyuser.yarn.hosts=* hadoop.kms.proxyuser.yarn.users=*

  2. Add the following property and setting to your yarn-site.xml file:

    yarn.resourcemanager.proxy-user-privileges.enabled=true

  3. Restart KMS and YARN Resource Manager.

BUG-69158N/AZeppelin, Spark

Description of Problem: By default, the Livy server times out after being idle for 60 minutes.

Associated error message: Subsequent attempts to access Livy generate an error, Exception: Session not found, Livy server would have restarted, or lost session.

Workaround: Set the timeout to a larger value through the property livy.server.session.timeout, and restart the Zeppelin Livy interpreter.

BUG-77311N/AZeppelin

Description of Problem: When one user restarts the %livy interpreter from the Interpreters (admin) page, other users' sessions restart too.

Workaround: Restart the %livy interpreter from within a notebook.

BUG-80901N/AZeppelin

Component Affected: Zeppelin/Livy

Description of Problem: This occurs when running applications through Zeppelin/Livy that requires some 3rd-party libraries. These libraries cannot be installed on all nodes in the cluster but they are installed on their edge nodes. Running in yarn-client mode this all works as the job is submitted on the edge node where the libraries are installed and runs there. In yarn-cluster mode, it fails because the libraries are missing.

Workaround: Set either spark.jars in spark-defaults.conf or livy.spark.jars in livy interpreters conf. Both are globally applicable. The jars need to be present on the livy machine in both cases. Updating livy conf is preferable since it affects only the zeppelin users.

RMP-5613N/AZeppelin

Component Affected: Zeppelin

Description of Problem: Zeppelin is not supported on Internet Explorer 8/9 because there is no native support for websockets.

Workaround: Use one of the following browsers:

  • Internet Explorer 10 or 11

  • Google Chrome latest stable release

  • Firefox latest stable release

  • Safari latest stable release

RMP-7856HBASE-14417HBase

Incremental backups do not capture bulk-loaded data.

RMP-7858HBASE-14141HBaseAll HBase WAL data is copied to the backup destination during an incremental backup, including data for tables that are not part of the backup. This is a limitation for the performance and security aspects of the HBase backup-and-restore feature. HBASE-14141 will introduce more granular copy WAL implementation.
Technical Service BulletinApache JIRAApache ComponentSummary
TSB-405N/AN/A

Impact of LDAP Channel Binding and LDAP signing changes in Microsoft Active Directory

Microsoft has introduced changes in LDAP Signing and LDAP Channel Binding to increase the security for communications between LDAP clients and Active Directory domain controllers. These optional changes will have an impact on how 3rd party products integrate with Active Directory using the LDAP protocol.

Workaround

Disable LDAP Signing and LDAP Channel Binding features in Microsoft Active Directory if they are enabled

For more information on this issue, see the corresponding Knowledge article: TSB-2021 405: Impact of LDAP Channel Binding and LDAP signing changes in Microsoft Active Directory

TSB-406N/AHDFS

CVE-2020-9492 Hadoop filesystem bindings (ie: webhdfs) allows credential stealing

WebHDFS clients might send SPNEGO authorization header to remote URL without proper verification. A maliciously crafted request can trigger services to send server credentials to a webhdfs path (ie: webhdfs://…) for capturing the service principal

For more information on this issue, see the corresponding Knowledge article: TSB-2021 406: CVE-2020-9492 Hadoop filesystem bindings (ie: webhdfs) allows credential stealing

TSB-434HADOOP-17208, HADOOP-17304Hadoop

KMS Load Balancing Provider Fails to invalidate Cache on Key Delete

For more information on this issue, see the corresponding Knowledge article: TSB 2020-434: KMS Load Balancing Provider Fails to invalidate Cache on Key Delete

TSB-465N/AHBase

Corruption of HBase data stored with MOB feature

For more information on this issue, see the corresponding Knowledge article: TSB 2021-465: Corruption of HBase data stored with MOB feature on upgrade from CDH 5 and HDP 2

TSB-497N/ASolr

CVE-2021-27905: Apache Solr SSRF vulnerability with the Replication handler

The Apache Solr ReplicationHandler (normally registered at "/replication" under a Solr core) has a "masterUrl" (also "leaderUrl" alias) parameter. The “masterUrl” parameter is used to designate another ReplicationHandler on another Solr core to replicate index data into the local core. To help prevent the CVE-2021-27905 SSRF vulnerability, Solr should check these parameters against a similar configuration used for the "shards" parameter.

For more information on this issue, see the corresponding Knowledge article: TSB 2021-497: CVE-2021-27905: Apache Solr SSRF vulnerability with the Replication handler

TSB-512N/AHBase

HBase MOB data loss

HBase tables with the MOB feature enabled may encounter problems which result in data loss.

For more information on this issue, see the corresponding Knowledge article: TSB 2021-512: HBase MOB data loss