Apache Hive 3 tables
Using Hive, you can create managed tables or external tables.
In Hive 3, Hive has full control over managed tables. Only through Hive can you access and change the data in managed tables. Managed tables are transactional tables having ACID (atomicity, consistency, isolation, and durability) properties. Because Hive has full control of managed tables, Hive optimizes these tables extensively. If you need to bypass Hive to access data directly on the file system, you use external tables or a storage handler, such as Druid or HBase. The following matrix lists the types of tables you can create using Hive, required storage format, and key operations.
|Table Type||File Format||ACID||Insert||Update/Delete|
|Table type||Security||Spark access||Optimizations|
|Managed (transactional)||Ranger authorization only, no SBA||Yes, using Hive Warehouse Connector||Statistics and others|
|External||Ranger or SBA, which requires an ACL in HDFS||Yes, direct file access||Limited|
Managed tables reside in the managed tablespace, which only Hive can access. By default, Hive assumes external tables reside in the external tablespace. The warehouse tablespaces are shown in the Files view in Ambari:
To determine the managed or external table type, run the DESCRIBE EXTENDED table_name.
Transactional (ACID) tables reside in the Hive warehouse. To achieve ACID compliance, Hive has to manage the table, including access to the table data. The data in CRUD (create, retrieve, update, and delete) tables must be in Optimized Row Column (ORC) file format. Insert-only tables support all file formats. Hive is designed to support a relatively low rate of transactions, as opposed to serving as an online analytical processing (OLAP) system. You can use the SHOW TRANSACTIONS command to list open and aborted transactions.
Transactional tables in Hive 3 are on a par with non-ACID tables. No bucketing or sorting is required in Hive 3 transactional tables. These tables are compatible with native cloud storage.
External table data is not owned or controlled by Hive. You typically use an external table when you want to access data directly at the file level, using a tool other than Hive. Hive 3 does not support the following capabilities for external tables:
- Query cache
- Materialized views, except in a limited way
- Default statistics gathering
- Compute queries using statistics
- Automatic runtime filtering
- File merging after insert
Hive limitations and prerequisites
Hive is not designed to replace systems such as MySQL or HBase. If upgrading from an earlier version to Hive 3, you must run a major compaction to use your transactional tables.