Setting target cluster for cloud storage in Hive

Before performing Hive replication from on-prem to any supported cloud storage, the target cluster for Hive cloud replication should be set up on cloud storage instances, with Hive warehouse directory on that specific cloud storage.

The target cluster is Data Lake cluster with metadata services such as HMS, Ranger, Atlas, and DLM Engine.

For a specific cloud account that is used for data replication, you must set up applicable path values for Hive replication function and Hive Metastore parameters.

Amazon S3 cloud storage

When you set up Amazon S3 as your target cloud cluster, use the following Hive Metastore configurations:

hive.metastore.warehouse.dir=<cloud storage path>

For Example:

hive.metastore.warehouse.dir=s3a://dummy-s3-bucket/apps/hive/warehouse

hive.repl.replica.functions.root.dir=<cloud storage path>

For Example:

hive.repl.replica.functions.root.dir=s3a://dummy-s3-bucket/apps/hive/repl

hive.warehouse.subdir.inherit.perms=false

The target cluster must have additional Amazon S3 credential configurations to access Amazon S3 storage buckets. For example, configuring AccessKey and SecretKey for S3 on core site of Hive cloud cluster. For more information, see Configuring Access to S3.

Microsoft WASB cloud storage

When you set up WASB as your target cloud cluster, use the following Hive Metastore configuration:

hive.metastore.warehouse.dir=<cloud storage path>

For Example:

hive.metastore.warehouse.dir=wasb://wasb-hive@dummy-wasb-account.blob.core.windows.net/apps/hive/warehouse

hive.repl.replica.functions.root.dir=<cloud storage path>

For Example:

hive.repl.replica.functions.root.dir=wasb://wasb-hive@dummy-wasb-account.blob.core.windows.net/apps/hive/repl

hive.warehouse.subdir.inherit.perms=false

The target cluster must have additional WASB credential configurations to access WASB storage containers. For more information, see Configuring Access to WASB.

Google cloud storage

When you set up Google cloud as your target cloud cluster, use the following Hive Metastore configuration:

hive.metastore.warehouse.dir=<cloud storage path>

For Example:

hive.metastore.warehouse.dir=gs://dummy-gcs-bucket/apps/hive/warehouse>

hive.repl.replica.functions.root.dir=<cloud storage path>

For Example:

hive.repl.replica.functions.root.dir=gs://dummy-gcs-bucket/apps/hive/repl>

hive.warehouse.subdir.inherit.perms=false

The target cluster must have additional Google cloud storage credential configurations to access Google cloud buckets.

Add and save the following configurations in core-site.xml file.

fs.gs.auth.service.account.email=email id of gcs service account

fs.gs.auth.service.account.private.key.id=private key id of gcs service account

fs.gs.auth.service.account.private.key=private key of gcs service account

The values for these configurations can be found in the JSON file that you downloaded while registering the Google cloud storage credentials with the DLM App.

	Note
	For more information, see Registering Google Cloud Account.