DLM Installation and Upgrade
Also available as:
PDF

Prerequisites

Before you begin the installation process, verify the following:

  • You must have root access to the nodes on which the DLM App and DLM Engine will be installed.
  • Ensure required services Knox, Ranger, HDFS, YARN, and Hive are installed.

  • Before you install DLM, make sure to verify if you are able to copy files between the Hadoop clusters/endpoints. Depending on various factors, your cluster environment might vary. It is recommended to use distributed copy command distcp to verify if the data between the clusters can be copied successfully. For more information, see Using DistCp.

  • Global LDAP is configured to share user-group mappings across clusters
  • If using Kerberos with different KDCs, two-way trust is configured between the KDCs
  • If using AD, there is no support for trust relationships across multiple domains or forests through domain and forest
  • Ensure to have one of the following external databases installed: MySQL or Postgres.

    See the Hortonworks Support Matrix for the compatible versions of DataPlane (DP) Platform, HDP, and DLM.

  • Knox SSO
    DP Platform and the DLM leverage Knox SSO to provide users and services with simplified and consistent access to clusters, data and other services. You must configure Knox SSO on the HDP clusters that you plan to use with DLM.
    Note
    Note

    The Knox SSO of your cluster must be configured to use the same LDAP/AD as your DP instance for user identity to match and propagate between the systems.

    Refer to the following documentation on how to configure your cluster for Knox SSO:
    Resource Documentation
    Install Knox and enable in Ambari HDP Security Guide, Install Knox
    Configure SSO topology HDP Security Guide, Identity Providers
    Configure Knox SSO for Ambari HDP Security Guide, ​Setting up Knox SSO for Ambari
    Configure LDAP with Ambari Ambari Security Guide, Configuring Ambari Authentication with LDAP or Active Directory Authentication
  • Perform the DataPlane Platform pre-installation tasks. For more information, see Prepare your clusters.
  • Install or upgrade to the supported version of Ambari. See Support Matrix for details of the supported Ambari versions. See Apache Ambari installation for more details.
  • Install or upgrade to the supported versions of HDP on your cluster using Ambari. See DLM Support Matrix for details of the supported HDP versions. See the HDP installation documentation for more details.
  • Ranger
    Ranger enables you to create services for specific Hadoop resources (HDFS, HBase, Hive) and add access policies to those services. If you use Ranger for authorisation in your cluster for LDAP users:
  • Knox Gateway

    Configuring Knox Gateway is required if your cluster is configured with Kerberos or with wire encryption. This simplifies certificate management for DP and cross-cluster communication, as the only security certificate that needs to be managed is for Knox.

    Refer to the following documentation on how to configure your cluster for Knox Gateway:

    Resource Documentation
    Configure a reverse proxy with Knox HDP Security Guide, Configuring the Knox Gateway
    Configure LDAP with Knox for proxy authentication HDP Security Guide, Setting Up LDAP Authentication
  • Hive

    You must configure Hive with Ranger authoriser. For more information, see

    Authorization using Apache Ranger Policies and hive.server2.enable.doAs=false

  • YARN

    DLM runs the replication jobs using YARN. For on-premise to on-premise replication, the replication job runs on the target cluster. For on-premise to cloud replication, the replication job runs on the source cluster. Make sure YARN is installed on the cluster where the replication job runs.

  • Ensure HDP clusters that are involved in replication have symmetric configuration. Each cluster in a replication relationship must be configured exactly the same for security (Kerberos), user management (LDAP/AD), and Knox Proxy. Cluster services like HDFS, HIVE, Knox, Ranger, and Atlas can have different configurations for High Availability (HA) i.e., source and target clusters have HA and non-HA setup respectively.

  • See the Hortonworks Support Matrix for the compatible versions of DP, HDP, and DLM.