DLM Administration
Also available as:
PDF

Create a replication policy

You must create a policy to assign the rules for the replication job (instance of a policy) that you want to execute. You can set rules such as the type of data to replicate, the time and frequency of replication, the bandwidth allowed for a job, and so forth. During replication, data and associated file metadata or table structures or schemas are also replicated.

  • You cannot modify a policy after it is created. To change a policy, you must create a new policy with the new settings.
  • DLM does not support update of any cluster endpoints (HDFS, Hive, Ranger, or DLM Engine). If an endpoint must be modified, contact Hortonworks support for assistance.
  • The first time you execute a job with data that has not been previously replicated, Data Lifecycle Manager creates a new folder or database and bootstraps the data.
    Important
    Important

    During a bootstrap operation, all data is replicated from the source cluster to the destination. As a result, the initial execution of a job can take a significant amount of time, depending on how much data is being replicated, network bandwidth, and so forth.

    After initial bootstrap, data replication is performed incrementally, so only updated data is transferred. Data is in a consistent state only after incremental replication has captured any new changes that occurred during bootstrap.

  • You must use the DLM Infrastructure Admin role to perform this task.
  • If using an S3 cluster for your policy, your credentials must have been registered on the Cloud Credentials page.
  • The clusters you want to include in the replication policy must have been paired already.
  • You must ensure that the clusters you select are healthy before you start a policy instance (job).
  • On destination clusters, the DLM Engine must have been granted write permissions on folders being replicated.
  • The target folder or database on the destination cluster must either be empty or not exist prior to starting a new policy instance.
  1. In the DLM navigation pane, click Policies.
    The Replication Policies page displays a list of any existing policies.
  2. Click Add > Policy.
  3. On the General page, enter or select the following information, and then click Select Source:
    • Policy Name
    • Description
    • Service: HDFS or Hive
  4. On the Select Source page, enter or select the following information, and then click Select Destination:
    • Type: S3 or Cluster
    • Source Cluster (if Type=Cluster is selected)
    • Cloud Credential (if Type=S3 is selected)

      You must have registered your credentials with DLM on the Cloud Credentials page.

    • Select a Folder Path (only if HDFS is selected)

      TDE-enabled directories are identified by a lock icon.

    • Enable snapshot based replication (only if HDFS is selected)

      HDFS Admin role is required to enable snapshots.

    • Select Database (Only if Hive is selected)

      TDE-enabled databases are identified by a lock icon.

  5. On the Select Destination page, enter or select the following information, and then click Schedule:
    • Type: S3 or Cluster
    • Destination Cluster (if Type=Cluster is selected)
    • TDE Same Key (if Type=Cluster is selected)

      Configures the policy to use the same TDE key for the source and destination.

    • Cloud Credential (if Type=S3 is selected)
  6. On the Schedule page, select when you want the job to run, and then click Advanced Settings:
    When setting the schedule, consider requirements such as RPO and RTO, network bandwidth, and so forth.
    • Start: On Schedule or From Now
    • Repeat
    • Start and End Dates
    • Start Time
  7. Enter or select the Advanced Settings, and then click Create Policy:
    Advanced Properties are optional.
    • Queue Name
    • Maximum Bandwidth
  8. Click Review and verify that the settings are correct.
    Important
    Important

    After a policy is created, it cannot be modified.

  9. Click Submit. A message appears, stating that the submission was successful.

Verify that the replication job is running as intended.