Accessing data using Apache Druid
Also available as:
PDF

Apache Druid introduction

HDP 3.x includes Apache Druid (incubating). Druid is an open-source, column-oriented data store for online analytical processing (OLAP) queries on event data. Druid is optimized for time-series data analysis and supports the following data analytics features:
  • Real-time streaming data ingestion
  • Automatic data summarization
  • Scalability to trillions of events and petabytes of data
  • Sub-second query latency
  • Approximate algorithms, such as hyperLogLog and theta

Druid is designed for enterprise-scale business intelligence (BI) applications in environments that require minimal latency and high availability. Applications running interactive queries can "slice and dice" data in motion.

You can use Druid as a data store to return BI about streaming data from user activity on a website or multidevice entertainment platform, from consumer events sent over by a data aggregator, or from a large set of transactions or Internet events.

HDP includes Druid 0.12.1, which is licensed under the Apache License, version 2.0.