Using Druid and Apache Hive
Also available as:
PDF

Accelerating Hive queries using Druid

You can efficiently perform analytic queries on both real-time and historical data using the HDP integration of Hive with Druid.

The integration of Hive with Druid places a SQL layer on Druid. After Druid ingests data from a Hive enterprise data warehouse (EDW), the interactive and sub-second query capabilities of Druid can be used to accelerate queries on historical data from the EDW. Hive integration with Druid enables applications such as Tableau to scale while queries run concurrently on both real-time and historical data. The following figure is an overview of how Hive historical data can be brought into a Druid environment. Queries analyzing Hive-sourced data are run directly on the historical nodes of Druid after indexing between the two databases completes.