Copying Files to or from an Encryption Zone
Information on how to copy existing files to or from an encryption zone, use a tool like distcp.
Note: for separation of administrative roles, do not use the
hdfs user to create encryption zones. Instead, designate another
administrative account for creating encryption keys and zones. See “Appendix: Creating
an HDFS Admin User” for more information.
The files will be encrypted using a file-level key generated by the Ranger Key Management Service.
DistCp is commonly used to replicate data between clusters for backup
and disaster recovery purposes. This operation is typically performed by the cluster
administrator, via an HDFS superuser account.
To retain this workflow when using HDFS encryption, a new virtual path prefix has been
/.reserved/raw/. This virtual path gives super users direct
access to the underlying encrypted block data in the file system, allowing super users
distcp data without requiring access to encryption keys. This also
avoids the overhead of decrypting and re-encrypting data. The source and destination
data will be byte-for-byte identical, which would not be true if the data were
re-encrypted with a new EDEK.
This means that if the
Recommendation: To avoid potential mishaps, first create identical encryption zones on the destination cluster.
Copying between encrypted and unencrypted locations
distcp compares file system checksums to verify that data
was successfully copied to the destination.
When copying between an unencrypted and encrypted location, file system checksums will
not match because the underlying block data is different. In this case, specify the
-update flags to avoid verifying