Hortonworks Data Platform
Also available as:
PDF

Known Issues

Summary of known issues for this release.

Hortonworks Bug ID Apache JIRA Apache component Summary
BUG-79238 N/A Documentation, HBase, HDFS, Hive, MapReduce, Zookeeper

Description of the problem or behavior

SSL is deprecated and its use in production is not recommended. Use TLS.

Workaround

In Ambari: Use ssl.enabled.protocols=TLSv1|TLSv1.1|TLSv1.2 and security.server.disabled.protocols=SSL|SSLv2|SSLv3. For help configuring TLS for other components, contact customer support. Documentation will be provided in a future release.

BUG-106494 N/A Documentation, Hive

Description of Problem

When you partition a Hive column of type double, if the column value is 0.0, the actual partition directory is created as "0". An AIOB exception occurs.

Associated error message

2018-06-28T22:43:55,498 ERROR
441773a0-851c-4b25-9e47-729183946a26 main exec.StatsTask:  Failed to run
stats task org.apache.hadoop.hive.ql.metadata.HiveException:
java.lang.IndexOutOfBoundsException:  Index: 8, Size: 8 at
org.apache.hadoop.hive.ql.metadata.Hive.setPartitionColumnStatistics(Hive.java:4395)
~hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT at
org.apache.hadoop.hive.ql.stats.ColStatsProcessor.persistColumnStats(ColStatsProcessor.java:179)
~hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT at
org.apache.hadoop.hive.ql.stats.ColStatsProcessor.process(ColStatsProcessor.java:83)
~hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT at
org.apache.hadoop.hive.ql.exec.StatsTask.execute(StatsTask.java:108)
hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT  at
org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:205)
hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT at
org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:97)
hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT  at
org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2689)
hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT  at
org.apache.hadoop.hive.ql.Driver.execute(Driver.java:2341)
hive-exec-4.0.0-SNAPSHOT.jar:4.0.0-SNAPSHOT at
org.apache.hadoop.hive.ql.Driver.run

Workaround

Do not partition columns of type double.

BUG-106379 N/A Documentation, Hive

Description of the Problem

The upgrade process fails to perform necessary compaction of ACID tables and can cause permanent data loss.

Workaround

If you have ACID tables in your Hive metastore, enable ACID operations in Ambari or set Hive configuration properties to enable ACID. If ACID operations are disabled, the upgrade process does not convert ACID tables. This causes permanent loss of data; you cannot recover data in your ACID tables later.

BUG-106286 N/A Documentation, Hive

Description of the Problem

The upgrade process might fail to make a backup of the Hive metastore, which is critically important.

Workaround

Manually make a manual backup of your Hive metastore database before upgrading. Making a backup is especially important if you did not use Ambari to install Hive and create the metastore database, but highly recommended in all cases. Ambari might not have the necessary permissions to perform the backup automatically. The upgrade can succeed even if the backup fails, so having a backup is critically important.

BUG-101082 N/A Documentation, Hive

Description of the problem or behavior

When running Beeline in batch mode, queries killed by the Workload Management process can on rare occasions mistakenly return success on the command line.

Workaround

There is currently no workaround.

BUG-103495 HBASE-20634, HBASE-20680, HBASE-20700 HBase

Description of the problem or behavior

Because the region assignment is refactored in HBase, there are unclear issues that may affect the stability of this feature. If you rely on RegionServer Groups feature, you are recommended to wait until a future HDP 3.x release, which will return the stability of this features as it was available in HBase 1.x/HDP 2.x releases.

Workaround

There is currently no workaround.

BUG-98727 N/A HBase

Description of the problem or behavior

Because the region assignment is refactored in HBase, there are unclear issues that may affect the stability of this feature. If you rely on Region replication feature, you are recommended to wait until a future HDP 3.x release, which will return the stability of this features as it was available in HBase 1.x/HDP 2.x releases.

Workaround

There is currently no workaround.

BUG-105983 N/A HBase

Description of the problem or behavior

An HBase service (Master or RegionServer) stops participating with the rest of the HBase cluster.

Associated error message

The service's log contains stack traces that contain "Kerberos principal name does NOT have the expected hostname part..."

Workaround

Retrying the connection solves the problem.

BUG-94954 HBASE-20552 HBase

Description of the problem or behavior

After a rolling restart of HBase, the HBase master may not correctly assign out all Regions to the cluster.

Associated error message

TThere are regions in transition, including hbase:meta, which result in "Region is not online on RegionServer" messages on Master or RegionServer or messages around errors in ServerCrashProcudure in the Master.

Workaround

Restart the HBase Master.

BUG-96402 HIVE-18687 Hive

Description of the problem or behavior

When HiveServer2 is running in HA (high-availability) mode in HDP 3.0.0, resource plans are loaded in-memory by all HiveServer2 instances. If a client makes changes to a resource plan, the changes are reflected (pushed) only in the HiveServer2 to which the client is connected.

Workaround

In order for the resource plan changes to be reflected on all HiveServer2 instances, all HiveServer2 instances has to be restarted so that they can reload the resource plan from metastore.

BUG-88614 N/A Hive

Description of the problem or behavior

RDMBS schema for Hive metastore contains an index HL_TXNID_INDEX defined as

CREATE INDEX HL_TXNID_INDEX ON HIVE_LOCKS USING hash (HL_TXNID);

Hash indexes are not recommended by PostgreSQL. For more information, see https://www.postgresql.org/docs/9.4/static/indexes-types.html

Workaround

It's recommended that this index is changed to type BTREE.

BUG-60904 KNOX-823 Knox

Description of the problem or behavior

When Ambari is being proxied by Apache Knox, the QuickLinks are not rewritten to go back through the gateway. If all access to Ambari is through Knox in the deployment, the new Ambari QuickLink profile may be used to hide and/or change URLs to go through Knox permanently. Future release will make these reflect the gateway appropriately.

Workaround

There is currently no workaround.

BUG-107399 N/A Knox

Description of the problem or behavior

After upgrade from previous HDP versions, certain topology deployments may return a 503 error.This includes, but may not be limited to, knoxsso.xml for the KnoxSSO enabled services.

Workaround

When this is encountered, a minor change through Ambari (whitespace even) to the knoxsso topology (or any other with this issue) and restart of the Knox gateway server should eliminate the issue.

BUG-110463 KNOX-1434 Knox

Description of the problem or behavior

Visiting Knox Admin UI in any browser (Firefox / Chrome) sets the HTTP Strict Transport Security (HSTS) header for the host where Knox is running. Any subsequent request to other service on the same host (e.g. Graphana, Ranger etc.) over HTTP would get redirected to HTTPS due to this header.

Please note that, this HSTS header is disabled in all Knox topologies by default.

For more information, see https://knox.apache.org/books/knox-1-1-0/user-guide.html#HTTP+Strict+Transport+Security

Impact

All the non-SSL requests to other services get redirected automatically to HTTPS and would result in SSL errors like: SSL_ERROR_RX_RECORD_TOO_LONG or some other error.

Workaround

Use the manager.xml topology and remove the setting from the WebAppSec provider. You can do this using the Knox Admin UI. After you have removed the setting, close your browser or clear the cookies.

BUG-106266 OOZIE-2769, OOZIE-3085, OOZIE-3156, OOZIE-3183 Oozie

Description of the problem or behavior

When check() method of SshActionExecutor gets invoked, Oozie will execute the command "ssh <host-ip> ps -p <pid>" to determine whether the SSH action completes or not. However if the connection to the host fails during the action status check, the command will return with an error code, but the action status will be determined as OK, which may not be correct.

Associated error message

SSH command exits with the exit status of the remote command or with 255 if an error occurred.

Workaround

Retrying the connection solves the problem.

BUG-95909 RANGER-1960 Ranger

Description of problem or behavior

Delete snapshot operation fails even if the user has Administrator privilege because the namespace is not considered in the Authorization flow for HBase Ranger plugin.

Associated error message

ERROR: org.apache.hadoop.hbase.security.AccessDeniedException: Insufficient permissions for user '&lt;username&gt;' (action=admin)

Workaround

For the delete snapshot operation to succeed, you need to be system-wide Administrator privileges.

BUG-89714 N/A Ranger

Description of the problem or behavior

Sudden increase in Login Session audit events from Ranger Usersync and Ranger Tagsync.

Associated error message

If policy storage DB size increases suddenly, then periodically backup and purge 'x_auth_sess' table periodically.

Workaround

Take a backup of the policy DB store and purge 'x_auth_sess' table from Ranger DB schema.

BUG-101227 N/A Spark

Description of the problem or behavior

When Spark Thriftserver has to run several queries concurrently, some of them can fail with a timeout exception when performing broadcast join.

Associated error message

Caused by:
java.util.concurrent.TimeoutException: Futures timed out after [300 seconds]  at
                           
scala.concurrent.impl.Promise$DefaultPromise.ready(Promise.scala:219)  at
scala.concurrent.impl.Promise$DefaultPromise.result(Promise.scala:223) at 
scala.concurrent.Await$$anonfun$result$1.apply(package.scala:107)  at
scala.concurrent.BlockContext$DefaultBlockContext$.blockOn(BlockContext.scala:53) at 
scala.concurrent.Await$.result(package.scala:107)  at
org.apache.spark.sql.execution.joins.BroadcastHashJoin.doExecute(BroadcastHashJoin.scala:107)
                           

Workaround

You can resolve this issue by increasing the spark.sql.broadcastTimeout value.

BUG-109979 N/A Spark

Description of the problem or behavior

YARN NodeManagers fail to start after a Spark patch upgrade due to YarnShuffleService CNF.

Workaround

To resolve this problem you must:

Replace "{{spark2_version}}" with "${hdp.version}" in "yarn.nodemanager.aux-services.spark2_shuffle.classpath" property value. For example, old value "{{stack_root}}/{{spark2_version}}/spark2/aux/*" -> new value "{{stack_root}}/${hdp.version}/spark2/aux/*"

BUG-65977 SPARK-14922 Spark

Description of the problem or behavior

Since Spark 2.0.0, `DROP PARTITION BY RANGE` is not supported grammatically. In other words, only '=' is supported while `<', '>', '<=', '>=' aren't.

Associated error message

scala> sql("alter table t drop partition (b<1) ").show
org.apache.spark.sql.catalyst.parser.ParseException:
mismatched input '<' expecting {')', ','}(line 1, pos 31)

== SQL ==
alter table t drop partition (b<1)
-------------------------------^^^

Workaround

To drop partition, use the exact match with '='.

scala> sql("alter table t drop partition (b=0) ").show
BUG-106917 N/A Sqoop

Description of the problem or behavior

In HDP 3, managed Hive tables must be transactional (hive.strict.managed.tables=true). Transactional tables with Parquet format are not supported by Hive. Hive imports with --as-parquetfile must use external tables by specifying --external-table-dir.

Associated error message

Table db.table failed strict managed table checks due to the
following reason: Table is marked as a managed table but is not
transactional. 

Workaround

When using --hive-import with --as-parquetfile, users must also provide --external-table-dir with a fully qualified location of the table:
sqoop import ... --hive-import
                 --as-parquetfile 
                 --external-table-dir hdfs:///path/to/table
BUG-102672 N/A Sqoop

Description of the problem or behavior

In HDP 3, managed Hive tables must be transactional (hive.strict.managed.tables=true). Writing transactional table with HCatalog is not supported by Hive. This leads to errors during HCatalog Sqoop imports if the specified Hive table does not exist or is not external.

Associated error message

Store into a transactional table db.table from Pig/Mapreduce is not supported

Workaround

Before running the HCatalog import with Sqoop, the user must create the external table in Hive. The --create-hcatalog-table does not support creating external tables.

BUG-109607 N/A YARN

Description of the problem or behavior

With wire encryption enabled with containerized Spark on YARN with Docker, Spark submit fails in "cluster" deployment mode. Spark submit in "client" deployment mode works successfully.

Associated error message

Store into a transactional table db.table from Pig/Mapreduce is not supported.

Workaround

There is currently no workaround.

BUG-110192 N/A YARN

Description of the problem or behavior

When YARN is installed and configured with KNOX SSO alone, Application Timeline Server web endpoint blocks remote REST calls from YARN UI and displays a 401 Unauthorized error.

Associated error message

401 Unauthorized error.

Workaround

Administrator needs to configure Knox authentication handler for Timeline Server and existing hadoop level configuration.

Administrator needs to tune the following cluster specific configurations. Values for the last two property is in the hadoop.authentication.* properties file.

<property>
<name>yarn.timeline-service.http-authentication.type</name>
<value>org.apache.hadoop.security.authentication.server.JWTRedirectAuthenticationHandler</value>
</property>

<property>
<name>yarn.timeline-service.http-authentication.authentication.provider.url</name>
<value>https://ctr-e138-1518143905142-455650-01-000002.hwx.site:444/gateway/knoxsso/api/v1/websso</value>
</property>

<property>
<name>yarn.timeline-service.http-authentication.public.key.pem</name>
<value>public.key.pem</value>
</property>
RMP-11408 ZEPPELIN-2170 Zeppelin

Description of the problem or behavior

Zeppelin does not show all WARN messages thrown by spark-shell at the Zeppelin's notebook level.

Workaround

There is currently no workaround for this.