DLM Administration
Also available as:
loading table of contents...

Hive tables - Managed and External

Managed tables are Hive owned tables where the entire lifecycle of the tables’ data are managed and controlled by Hive. External tables are tables where Hive has loose coupling with the data.

All the write operations to the managed tables are performed using Hive SQL commands. If a managed table or partition is dropped, the data and metadata associated with that table or partition are deleted. The transactional semantics (ACID) are also supported only on managed tables.

The writes on external tables can be performed using Hive SQL commands but data files can also be accessed and managed by processes outside of Hive. If an external table or partition is dropped, only the metadata associated with the table or partition is deleted but the underlying data files stay intact. A typical example for external table is to run analytical queries on HBase or Druid owned data via Hive, where data files are written by HBase or Druid and Hive reads them for analytics.

Hive supports replication of external tables with data to target cluster and it retains all the properties of external tables.

The data files permission and ownership are preserved so that the relevant external processes can continue to write in it even after failover.

For handling conflicts in external tables’ data location due to replication from multiple source clusters to same target cluster, DLM assigns a unique base directory for each source cluster under which, external tables data from corresponding source cluster would be copied. For example, if external table location at a source cluster is /ext/hbase_data and after replication, the location in target cluster would be <base_dir>/ext/hbase_data. Users can track the new location of external tables using DESCRIBE TABLE command.

DLM upgrade use-case: In a normal scenario, if you had external tables that were replicated as managed tables, after the upgrade process, you must drop those tables from target and set the base directory. In the next instance they get replicated as external tables.