6.7. Known Issues for Hive

BUG-16890: Hive SQL standard auth calls accessing local or HDFS URLs fail in Kerberos secure cluster with binary HS2 transport.

Problem: This is blocking all CREATE table calls where we access LOCAL or HDFS uri.

>>> create external table studenttab10k( 
name string, 
age int, 
gpa double) 
row format delimited 
fields terminated by '\t' 
stored as textfile 
location '/user/hcat/tests/data/studenttab10k'; 
2014-04-17 00:12:13,627 DEBUG [main] transport.TSaslTransport: writing data length: 297 
2014-04-17 00:12:13,657 DEBUG [main] transport.TSaslTransport: CLIENT: reading data length: 351 
Error: Error while compiling statement: FAILED: HiveAccessControlException Permission denied. 
Principal [name=hrt_qa@HORTON.YGRIDCORE.NET, type=USER] does not have following privileges on Object 
[type=DFS_URI, name=/user/hcat/tests/data/studenttab10k] : [INSERT, DELETE, OBJECT OWNERSHIP] (state=42000,code=40000)

BUG-16660: On Tez setup, Hive jobs in webhcat run in default mr mode even in Hive.

Problem: Currently when we run Hive jobs through Webhcat we always run in MR mode even though we are running them in a cluster where Hive queries would have run in Tez mode. This is only on Linux installs. The problem here is that we run hive queries using hive.tar.gz on HDFS and specifying explicit hive configurations, here are the properties that we use in webhcat-site.xml:

templeton.hive.archive	hdfs:///apps/webhcat/hive.tar.gz
templeton.hive.path	value: hive.tar.gz/hive/bin/hive
templeton.hive.home	value: hive.tar.gz/hive
templeton.hive.properties	hive.metastore.local=false, hive.metastore. uris=thrift://hivehost:9083, hive.metastore.sasl.enabled=false, hive.metastore.execute.setugi=true

When the Hive command is run it builds the hiveconf from the templeton.hive.properties. To enable Tez we would need to atleast add "hive.execution.engine=tez" to templeton.hive.properties. On Windows this is not a problem because we use the local Hive installation.

Workaround: The workaround for people who wants to run with Tez would be to add "hive.execution.engine=tez" to the templeton.hive.properties. The installer would need to change to accomodate this.

BUG-16608: Oozie table import job fails with error where user hive wants to write to table dir owned by the table owner.

Problem: The job fails with the following permission error:

Copying data from hdfs://arpit-falcon-2.cs1cloud.internal:8020/projects/ivory/staging/FALCON_FEED_REPLICATION_raaw-logs16-a6acf050-a038-48d5-9867-de63707291a8_corp-cdd34e35-86b6-45ae-a6cf-d6e879b7b7fb/default/HCatReplication_oneSourceOneTarget_hyphen/dt=2010-01-01-20/2010-01-01-20-00/data/dt=2010-01-01-20 
Copying file: hdfs://arpit-falcon-2.cs1cloud.internal:8020/projects/ivory/staging/FALCON_FEED_REPLICATION_raaw-logs16-a6acf050-a038-48d5-9867-de63707291a8_corp-cdd34e35-86b6-45ae-a6cf-d6e879b7b7fb/default/HCatReplication_oneSourceOneTarget_hyphen/dt=2010-01-01-20/2010-01-01-20-00/data/dt=2010-01-01-20/data.txt 
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Got exception: org.apache.hadoop.security.AccessControlException Permission denied: user=hive, access=WRITE, inode="/tmp/falcon-regression/HCatReplication/HCatReplication_oneSourceOneTarget_hyphen":arpit:hdfs:drwxr-xr-x 
at org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkFsPermission(FSPermissionChecker.java:265)

BUG-16476: Oozie-Hive tests run as hadoopqa creates/accesses the /tmp/hive-hadoop folder.
Problem: Oozie-Hive tests were run as "hadoopqa" user, concurrently with hcatalog tests. When the tests failed, the HFDFS permissions were as shown below. It is unclear why /tmp/hive-hadoop folder was ever created.
```
D:\hdp\hadoop-2.4.0.2.1.1.0-1533\bin>hadoop.cmd dfs -ls /tmp 
drwxr-xr-x - hadoop hdfs 0 2014-04-09 19:01 /tmp/hive-hadoop 
drwxr-xr-x - hadoopqa hdfs 0 2014-04-09 18:50 /tmp/hive-hadoopqa                    
```

BUG-16864: When Hive standard authorization is enabled, the owner of the table backing index is missing.

Problem: The query fails with the following error:

 
2014-04-16 16:50:13,312 ERROR [pool-7-thread-5]: ql.Driver (SessionState.java:printError(546)) - FAILED: HiveAccessControlException Permission denied. Principal [name=hrt_qa, type=USER] does not have following privileges on Object [type=TABLE_OR_VIEW, name=default.default__missing_ddl_3_missing_ddl_3_index__] : [OBJECT OWNERSHIP] 
org.apache.hadoop.hive.ql.security.authorization.plugin.HiveAccessControlException: Permission denied. Principal [name=hrt_qa, type=USER] does not have following privileges on Object [type=TABLE_OR_VIEW, name=default.default__missing_ddl_3_missing_ddl_3_index__] : [OBJECT OWNERSHIP] 
at org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLAuthorizationUtils.assertNoMissingPrivilege(SQLAuthorizationUtils.java:361) 
at org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLStdHiveAuthorizationValidator.checkPrivileges(SQLStdHiveAuthorizationValidator.java:105) 
at org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLStdHiveAuthorizationValidator.checkPrivileges(SQLStdHiveAuthorizationValidator.java:77) 
at org.apache.hadoop.hive.ql.security.authorization.plugin.HiveAuthorizerImpl.checkPrivileges(HiveAuthorizerImpl.java:84) 
at org.apache.hadoop.hive.ql.Driver.doAuthorizationV2(Driver.java:695) 
at org.apache.hadoop.hive.ql.Driver.doAuthorization(Driver.java:510) 
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:462) 
at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:322) 
at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:976) 
at org.apache.hadoop.hive.ql.Driver.compileAndRespond(Driver.java:969) 
at org.apache.hive.service.cli.operation.SQLOperation.prepare(SQLOperation.java:99) 
at org.apache.hive.service.cli.operation.SQLOperation.run(SQLOperation.java:172) 
at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementInternal(HiveSessionImpl.java:231) 
at org.apache.hive.service.cli.session.HiveSessionImpl.executeStatementAsync(HiveSessionImpl.java:218) 
at org.apache.hive.service.cli.CLIService.executeStatementAsync(CLIService.java:233) 
at org.apache.hive.service.cli.thrift.ThriftCLIService.ExecuteStatement(ThriftCLIService.java:346) 
at org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1313) 
at org.apache.hive.service.cli.thrift.TCLIService$Processor$ExecuteStatement.getResult(TCLIService.java:1298) 
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39) 
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39) 
at org.apache.hive.service.auth.TSetIpAddressProcessor.process(TSetIpAddressProcessor.java:55) 
at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206) 
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) 
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) 
at java.lang.Thread.run(Thread.java:662) 
...

BUG-16802: Hive on Tez query passes, but the application is in the killed state.
Problem: The Hive session should shut down cleanly and not kill the app.
BUG-16771: (Apache Bug: HIVE-6867) Hive table has multiple copies of streaming data when testing the Hive Server restart scenario.
Problem: When running the Hive restart test where the Hive metastore is bounced while Flume is streaming data to Hive, 3 duplicate copies were observed for each row in the Hive table. (Expected: 200 rows; observed: 800 rows, or 3 complete copies of the expected set of 200.)
BUG-16667: Alter index rebuild fails with FS-based stats gathering.
Problem: We force create_index to run in MR mode when we have a TEZ run. But it is failing intermittently. (This problem is not seen on non-Tez runs.)
BUG-16393: Bucketized Table feature fails in some cases.
Problem: Bucketized Table feature fails in some cases. If the source and destination are bucketed on the same key, and if the actual data in the source is not bucketed (because the data got loaded using LOAD DATA LOCAL INPATH) then the data won't be bucketed while writing to the destination. Example follows:

CREATE TABLE P1(key STRING, val STRING) 
CLUSTERED BY (key) SORTED BY (key) INTO 2 BUCKETS STORED AS TEXTFILE; 

LOAD DATA LOCAL INPATH '/Users/jpullokkaran/apache-hive1/data/files/P1.txt' INTO TABLE P1; 

-- perform an insert to make sure there are 2 files 
INSERT OVERWRITE TABLE P1 select key, val from P1;

Workaround: Avoid loading data for bucketed table.
BUG-16391: Streaming transactions fail on MSSQL.
Problem: After creating tables using the MSSQL composite script provided by BUG-15827 running Flume, Hive Sink tests failed because no data made it into Hive tables.

BUG-15733: Schema evolution is broken on Tez.

Problem: The error returned on the Hive console is:

Here is the error in the Hive console log: 
Vertex failed, vertexName=Map 1, vertexId=vertex_1395920136483_7733_1_00, diagnostics=[Task failed, taskId=task_1395920136483_7733_1_00_000000, diagnostics=[AttemptID:attempt_1395920136483_7733_1_00_000000_0 Info:Error: java.io.IOException: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.hive.serde2.columnar.BytesRefArrayWritable 
at org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121) 
at org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77) 
at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:344) 
at org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:79) 
at org.apache.hadoop.hive.ql.io.HiveRecordReader.doNext(HiveRecordReader.java:33) 
at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:122) 
at org.apache.hadoop.mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(TezGroupedSplitsInputFormat.java:122) 
at org.apache.tez.mapreduce.input.MRInput$MRInputKVReader.next(MRInput.java:510) 
at org.apache.hadoop.hive.ql.exec.tez.MapRecordProcessor.run(MapRecordProcessor.java:158) 
at org.apache.hadoop.hive.ql.exec.tez.TezProcessor.run(TezProcessor.java:160) 
at org.apache.tez.runtime.LogicalIOProcessorRuntimeTask.run(LogicalIOProcessorRuntimeTask.java:306) 
at org.apache.hadoop.mapred.YarnTezDagChild$4.run(YarnTezDagChild.java:549) 
at java.security.AccessController.doPrivileged(Native Method) 
at javax.security.auth.Subject.doAs(Subject.java:396) 
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1548) 
at org.apache.hadoop.mapred.YarnTezDagChild.main(YarnTezDagChild.java:538) 
Caused by: java.lang.ClassCastException: org.apache.hadoop.io.Text cannot be cast to org.apache.hadoop.hive.serde2.columnar.BytesRefArrayWritable 
at org.apache.hadoop.hive.ql.io.RCFileRecordReader.next(RCFileRecordReader.java:44) 
at org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:339) 
... 13 more

BUG-13796: When running with correlation optimization enabled on Tez, TPCDS queries 1, 32, 94, 95 and 97 fail with ClassCastException.
BUG-8227: Hive needs to implement recovery or extend FileOutputComitter.
Problem: When running Hive jobs and restarting RM, Hive jobs start again from scratch, causing the job to fail after the maximum number of retries. OutputComitter defaults recovery to false (see below). Hive needs to implement recovery or move to extending FileOutputComitter.
```
public boolean isRecoverySupported() { 
        return false; 
```

BUG-14965: HCatalog log located at %HIVE_HOME%/log/hcat.log
Problem: The HCatalog log file is in the wrong location. HCatalog logs are not written to %HDP_LOG_DIR%.
Workaround: View the log at %HIVE_HOME%/log/hcat.log.
BUG-16391: Streaming Transaction not supported for Microsoft SQL Server metastores
Problem: HCatalog's new streaming ingest feature does not work when Microsoft SQL Server as the metadata store database.
Workaround: When using the streaming transactions feature, use Derby for the metastore.

Legal notices