Adding Druid to a cluster
Also available as:

Add Apache Druid to the cluster

You use Apache Ambari to add Apache Druid (incubating) to your cluster.

To use Druid in a real-world environment, the cluster must have access to the following resources to make Druid operational in HDP:
  • ZooKeeper:

    A Druid instance requires that you select Apache ZooKeeper as a Service when you add Druid to the cluster; otherwise, Ambari does not add Druid to the cluster. ZooKeeper coordinates Druid nodes and manages elections among coordinator and overlord nodes.

  • Deep storage:

    HDFS or Amazon S3 can be used as the deep storage layer for Druid in HDP. In Ambari, you can select HDFS as a Service for this storage layer. Alternatively, you can set up Druid to use Amazon S3 as the deep storage layer by setting the property to s3. The cluster relies on the distributed file system to store Druid segments for permanent backup of the data.

  • Metadata storage:

    The metadata store is used to persist information about Druid segments and tasks. MySQL and Postgres are supported metadata stores. You can select the metadata database when you install and configure Druid with Ambari.

  • Batch execution engine:

    Select YARN + MapReduce2 for the execution resource manager and execution engine, respectively. Druid Hadoop index tasks use MapReduce jobs for distributed ingestion of large amounts of data.

  • (Optional) Druid metrics reporting:

    If you plan to monitor Druid performance metrics using Grafana dashboards in Ambari, select Ambari Metrics System as a Service.

If you plan to deploy high availability (HA) on a Druid cluster, you need to know which components to install and how to configure the installation so that the Druid instance is primed for an HA environment.

As you add Druid as a service, you generally replicate the following components:

  • Druid Historical: Loads data segments.
  • Druid MiddleManager: Runs Druid indexing tasks.

You have installed the following components:

  • Ambari 2.7.0 or later
  • HDP 3.0 or later using Ambari
  • ZooKeeper
  • HDFS or Amazon S3
  • A MySQL or Postgres database for storing metadata if you need high availability; otherwise, you can use the default Derby database installed and configured by Ambari.
  • YARN and MapReduce2
  1. From the Ambari navigation pane, select Services > Add Service.
  2. Select Druid, and click Next.
  3. In Assign Masters, select Druid Historical and Druid MiddleManager for multiple nodes.
  4. In Customize Services, select a Druid Metadata storage type or accept the default Derby database.
  5. Enter a Metadata storage password.
  6. Accept the default configuration values and suggested changes if you use the default Derby database.
  7. Click Next, and Deploy.
  8. Check Apache Ambari Installation Guide if you encounter any problems to see if other components are needed for your particular cluster.