HDFS replication policy process overview
- The DLM App submits the replication policy to the DLM Engine on the destination cluster. The DLM Engine then schedules replication jobs at the specified frequency.
- At the specific frequency, DLM Engine submits a DistCp job that runs on destination YARN, reads data from source HDFS, and writes to destination HDFS.
- File length and checksums are used to determine changed files and validate that the data is copied correctly.
The Ranger policies for the HDFS directory are exported from source Ranger service and replicated to destination Ranger service.
DLM Engine also adds a deny policy on the destination Ranger service for the target directory so that the target is not writable.