Adding Druid to a cluster
Also available as:
PDF

Add Druid to the cluster

You use Apache Ambari to add Druid to your cluster.

To use Druid in a real-world environment, the cluster must have access to the following resources to make Druid operational in HDP:
  • ZooKeeper:

    A Druid instance requires that you select Apache ZooKeeper as a Service when you add Druid to the cluster; otherwise, Ambari does not add Druid to the cluster. ZooKeeper coordinates Druid nodes and manages elections among coordinator and overlord nodes.

  • Deep storage:

    HDFS or Amazon S3 can be used as the deep storage layer for Druid in HDP. In Ambari, you can select HDFS as a Service for this storage layer. Alternatively, you can set up Druid to use Amazon S3 as the deep storage layer by setting the druid.storage.type property to s3. The cluster relies on the distributed file system to store Druid segments for permanent backup of the data.

  • Metadata storage:

    The metadata store is used to persist information about Druid segments and tasks. MySQL and Postgres are supported metadata stores. You can select the metadata database when you install and configure Druid with Ambari.

  • Batch execution engine:

    Select YARN + MapReduce2 for the execution resource manager and execution engine, respectively. Druid Hadoop index tasks use MapReduce jobs for distributed ingestion of large amounts of data.

  • (Optional) Druid metrics reporting:

    If you plan to monitor Druid performance metrics using Grafana dashboards in Ambari, select Ambari Metrics System as a Service.

If you plan to deploy high availability (HA) on a Druid cluster, you need to know which components to install and how to configure the installation so that the Druid instance is primed for an HA environment.

As you add Druid as a service, you select the following components:

  • Druid Historical: Loads data segments.
  • Druid MiddleManager: Runs Druid indexing tasks.

You have installed the following components:

  • Ambari 2.7.0 or later
  • HDP 3.0 or later using Ambari
  • ZooKeeper
  • HDFS or Amazon S3
  • A MySQL or Postgres database for storing metadata if you need high availability; otherwise, you can use the default Derby database installed and configured by Ambari.
  • YARN and MapReduce2
  1. From the Ambari navigation pane, select Stack and Versions, scroll down the list of services to Druid, and click Add Service.
  2. In the Add Service wizard, scroll down to Druid, which is selected for addition, and click Next.
  3. In Assign Masters, generally select Druid Historical and Druid MiddleManager for multiple nodes.
  4. Take the default values if you are using the default Derby database.
  5. Check Apache Ambari Installation Guide if you encounter any problems to see if other components are needed for your particular cluster.