Accessing data using Apache Druid
Also available as:
PDF

Setting up and using Apache Druid

After learning hardware recommendations and software requirements, you add the Apache Druid (incubating) service to an HDP 3.x cluster.

Recommendations:

  • Assign the Overload, Coordinator, and Router to one or more master nodes of size AWS m3.xlarge or equivalent: 4 vCPUs, 15 GB RAM, 80 GB SSD storage
  • Co-locate the Historical and MiddleManager on different nodes from the Overload, Coordinator, Router, and Broker, and on nodes of size AWS r3.2Xlarge or equivalent: 8 vCPUs, 61 GB RAM, 160 GB SSD storage.
  • Do not co-locate LLAP daemons and Historical components.

Software Requirements:

  • A MySQL or Postgres database for storing metadata in a cluster for a production

    You can use the default Derby database installed and configured by Ambari if you are using a single-node cluster for development.

  • Ambari 2.7.0 or later
  • Database connector set up in Ambari
  • HDP 3.0 or later
  • ZooKeeper
  • HDFS or Amazon S3
  • YARN and MapReduce2