Using Apache HiveQL
Also available as:
PDF

Apache Hive 3 tables

Using Hive, you can create managed tables or external tables.

In Hive 3, Hive has full control over managed tables. Only through Hive can you access and change the data in managed tables. Managed tables are transactional tables having ACID (atomicity, consistency, isolation, and durability) properties. Because Hive has full control of managed tables, Hive optimizes these tables extensively. If you need to bypass Hive to access data directly on the file system, you use external tables or a storage handler, such as Druid or HBase. The following matrix lists the types of tables you can create using Hive, required storage format, and key operations.

Table Type File Format ACID Insert Update/Delete
CRUD transactional ORC Yes Yes Yes
Insert-only transactional Any Yes Yes No
External Any No Yes Yes
The following table lists differences between managed (transactional) and external tables:
Table type Security Spark access Optimizations
Managed (transactional) Ranger authorization only, no SBA Yes, using Hive Warehouse Connector Statistics and others
External Ranger or SBA, which requires an ACL in HDFS Yes, direct file access Limited

Managed tables reside in the managed tablespace, which only Hive can access. By default, Hive assumes external tables reside in the external tablespace. The warehouse tablespaces are shown in the Files view in Ambari:

To determine the managed or external table type, run the DESCRIBE EXTENDED table_name.

Transactional tables

Transactional (ACID) tables reside in the Hive warehouse. To achieve ACID compliance, Hive has to manage the table, including access to the table data. The data in CRUD (create, retrieve, update, and delete) tables must be in Optimized Row Column (ORC) file format. Insert-only tables support all file formats. Hive is designed to support a relatively low rate of transactions, as opposed to serving as an online analytical processing (OLAP) system. You can use the SHOW TRANSACTIONS command to list open and aborted transactions.

Transactional tables in Hive 3 are on a par with non-ACID tables. No bucketing or sorting is required in Hive 3 transactional tables. These tables are compatible with native cloud storage.

External tables

External table data is not owned or controlled by Hive. You typically use an external table when you want to access data directly at the file level, using a tool other than Hive. Hive 3 does not support the following capabilities for external tables:

  • Query cache
  • Materialized views, except in a limited way
  • Default statistics gathering
  • Compute queries using statistics
  • Automatic runtime filtering
  • File merging after insert

Hive limitations and prerequisites

Hive is not designed to replace systems such as MySQL or HBase. If upgrading from an earlier version to Hive 3, you must run a major compaction to use your transactional tables.