Using Apache HiveQL
Also available as:

Apache Hive 3 tables

Using Hive, you can create managed tables or external tables.

In Hive 3, Hive has full control over managed tables. Only through Hive can you access and change the data in managed tables. Managed tables, except temporary tables, are transactional tables having ACID (atomicity, consistency, isolation, and durability) properties. Because Hive has full control of managed tables, Hive can optimize these tables extensively. If you need to bypass Hive to access data directly on the file system, you use external tables or a storage handler, such as Druid or HBase.

The following matrix lists the types of tables you can create using Hive, whether or not ACID properties are supported, required storage format, and key SQL operations.

Managed: CRUD transactional Yes ORC Yes Yes
Managed: Insert-only transactional Yes Any Yes No
Managed: Temporary No Any Yes No
External No Any Yes No

Although you cannot use the SQL UPDATE or DELETE statements to delete data in some types of tables, you can use DROP PARTITION on any table type to delete the data.

The managed table storage type is Optimized Row Column (ORC) by default. If you accept the default by not specifying any storage during table creation, or if you specify ORC storage, the result is an ACID table with insert, update, and delete (CRUD) capabilities. If you specify any other storage type, such as text, CSV, AVRO, or JSON, the result is an insert-only ACID table. You cannot update or delete columns in the table.

The following table and subsequent sections cover other differences between managed (transactional) and external tables:
Table type Security Spark access Optimizations
Managed (transactional) Ranger authorization only, no SBA Yes, using Hive Warehouse Connector Statistics and others
External Ranger or SBA, which requires an ACL in HDFS Yes, direct file access Limited

Transactional tables

Transactional (ACID) tables reside in the Hive warehouse. To achieve ACID compliance, Hive has to manage the table, including access to the table data. The data in CRUD (create, retrieve, update, and delete) tables must be in ORC file format. Insert-only tables support all file formats. Hive is designed to support a relatively low rate of transactions, as opposed to serving as an online analytical processing (OLAP) system. You can use the SHOW TRANSACTIONS command to list open and aborted transactions.

Transactional tables in Hive 3 are on a par with non-ACID tables. No bucketing or sorting is required in Hive 3 transactional tables. These tables are compatible with native cloud storage.

Hive supports one statement per transaction, which can include any number of rows, partitions, or tables.

External tables

External table data is not owned or controlled by Hive. You typically use an external table when you want to access data directly at the file level, using a tool other than Hive. Hive 3 does not support the following capabilities for external tables:

  • Query cache
  • Materialized views, except in a limited way
  • Default statistics gathering
  • Compute queries using statistics
  • Automatic runtime filtering
  • File merging after insert

When you run DROP TABLE on an external table, by default Hive drops only the metadata (schema). If you want the DROP TABLE command to also remove the actual data in the external table, as DROP TABLE does on a managed table, you need to set the external.table.purge property to true as described later.

Location of tables in HDP 3.x

Managed tables reside in the managed tablespace, which only Hive can access. By default, Hive assumes external tables reside in the external tablespace. The warehouse tablespaces are shown in the Files view in Ambari:

To determine the managed or external table type, you can run the DESCRIBE EXTENDED table_name command.

Hive limitations and prerequisites

Hive is not designed to replace systems such as MySQL or HBase. If upgrading from an earlier version to Hive 3, you must run a major compaction to use your transactional tables.