HDFS Administration Guide
Also available as:
PDF
loading table of contents...

HBase

HBase stores all of its data under its root directory in HDFS, configured with hbase.rootdir. The only other directory that the HBase service will read or write is hbase.bulkload.staging.dir.

On HDP clusters, hbase.rootdir is typically configured as /apps/hbase/data, and hbase.bulkload.staging.dir is configured as /apps/hbase/staging. HBase data, including the root directory and staging directory, can reside in an encryption zone on HDFS.

The HBase service user needs to be granted access to the encryption key in the Ranger KMS, because it performs tasks that require access to HBase data (unlike Hive or HDFS).

By design, HDFS-encrypted files cannot be bulk-loaded from one encryption zone into another encryption zone, or from an encryption zone into an unencrypted directory. Encrypted files can only be copied. An attempt to load data from one encryption zone into another will result in a copy operation. Within an encryption zone, files can be copied, moved, bulk-loaded, and renamed.

Recommendations

  • Make the parent directory for the HBase root directory and bulk load staging directory an encryption zone, instead of just the HBase root directory. This is because HBase bulk load operations need to move files from the staging directory into the root directory.

  • In typical deployments, /apps/hbase can be made an encryption zone.

  • Do not create encryption zones as subdirectories under /apps/hbase, because HBase may need to rename files across those subdirectories.

  • The landing zone for unencrypted data should always be within the destination encryption zone.

Steps

On a cluster without HBase currently installed:

  1. Create the /apps/hbase directory, and make it an encryption zone.

  2. Configure hbase.rootdir=/apps/hbase/data.

  3. Configure hbase.bulkload.staging.dir=/apps/hbase/staging.

On a cluster with HBase already installed, perform the following steps:

  1. Stop the HBase service.

  2. Rename the /apps/hbase directory to /apps/hbase-tmp.

  3. Create an empty /apps/hbase directory, and make it an encryption zone.

  4. DistCp -skipcrccheck -update all data from /apps/hbase-tmp to /apps/hbase, preserving user-group permissions and extended attributes.

  5. Start the HBase service and verify that it is working as expected.

  6. Remove the /apps/hbase-tmp directory.

Changes in Behavior after HDFS Encryption is Enabled

The HBase bulk load process is a MapReduce job that typically runs under the user who owns the source data. HBase data files created as a result of the job are then bulk loaded in to HBase RegionServers. During this process, HBase RegionServers move the bulk-loaded files from the user's directory and move (rename) the files into the HBase root directory (/apps/hbase/data). When data at rest encryption is used, HDFS cannot do a rename across encryption zones with different keys.

Workaround: run the MapReduce job as the hbase user, and specify an output directory that resides in the same encryption zone as the HBase root directory.