DLM Administration
Also available as:
PDF

Create a replication policy

You must create a policy to assign the rules for the replication job (instance of a policy) that you want to execute. You can set rules such as the type of data to replicate, the time and frequency of replication, the bandwidth allowed for a job, and so forth. During replication, data and associated file metadata or table structures or schemas are also replicated.

  • DLM does not support update of any cluster endpoints (HDFS, Hive, Ranger, or DLM Engine). If an endpoint must be modified, contact Hortonworks support for assistance.
  • The first time you execute a job with data that has not been previously replicated, DLM copies all of the data. The bootstrap process can take hours to days, depending on data size, so plan your time accordingly.
  • You must use the DLM Infrastructure Admin role to perform this task.
  • The target folder or database on the destination cluster must either be empty or not exist prior to starting a new policy instance.
  1. In the DLM navigation pane, click Policies.
    The Replication Policies page displays a list of any existing policies.
  2. Click Add > Policy.
  3. On the General page, enter or select the following information, and then click Select Source:
    • Policy Name
    • Description
    • Service: HDFS or Hive
  4. On the Select Source page, enter or select the following information, and then click Select Destination:
    • Type: S3 or Cluster
    • Source Cluster (if Type=Cluster is selected)
    • Cloud Credential (if Type=S3 is selected)

      You must have registered your credentials with DLM on the Cloud Credentials page.

    • Select a Folder Path (only if HDFS is selected)

      TDE-enabled directories are identified by a lock icon. The entire source directory must be either encrypted or not encrypted, otherwise policy creation fails.

    • Enable snapshot based replication (only if HDFS is selected)

      HDFS Admin role is required to enable snapshots.

    • Select Database (Only if Hive is selected)

      TDE-enabled databases are identified by a lock icon.

  5. On the Select Destination page, enter or select the following information, and then click Schedule:
    • Type: S3 or Cluster
    • Destination Cluster (if Type=Cluster is selected)
    • TDE Same Key (if Type=Cluster is selected)

      Configures the policy to use the same TDE key for the source and destination.

    • Cloud Credential (if Type=S3 is selected)
  6. On the Schedule page, select when you want the job to run, and then click Advanced Settings:
    When setting the schedule, consider requirements such as RPO and RTO, network bandwidth, and so forth.
    • Start: On Schedule or From Now
    • Repeat
    • Start and End Dates
    • Start Time
  7. Enter or select the Advanced Settings, and then click Create Policy:
    Configuring Advanced Settings is optional.
    • Queue Name

      If you are using Capacity Scheduler queues to limit resource consumption, enter the name of the YARN queue for the cluster to which the replication job will be submitted.

    • Maximum Bandwidth

      You can adjust this setting so that each map task is throttled to consume only the specified bandwidth so that the net bandwidth used tends towards the specified value. The default value for the bandwidth is 1 MB per second.

    • Maximum Maps

      Use this option to set the maximum number of map tasks (simultaneous copies) per replication job.

    The Advanced Settings attributes are applied only during DLM replication jobs that are based on DistCp functionality.
  8. Click Review and verify that the settings are correct.

    After a policy is created, the policy name and the clusters associated with the policy cannot be modified.

  9. Click Submit. A message appears, stating that the submission was successful.
    When the policy job runs, checks are performed to verify the copied data.

View job status to verify that the replication job is running as intended.