Setting target cluster for cloud storage in Hive
Before performing Hive replication from on-prem to any supported cloud storage, the target cluster for Hive cloud replication should be set up on cloud storage instances, with Hive warehouse directory on that specific cloud storage.
The target cluster is Data Lake cluster with metadata services such as HMS, Ranger, Atlas, and DLM Engine.
For a specific cloud account that is used for data replication, you must set up applicable path values for Hive replication function and Hive metastore parameters.
Amazon S3 cloud storage
When you set up Amazon S3 as your target cloud cluster, use the following Hive metastore configuration:
The target cluster must have additional Amazon S3 credential configurations to access Amazon S3 storage buckets. For more information, see Configuring Access to S3.
Microsoft WASB cloud storage
When you set up WASB as your target cloud cluster, use the following Hive metastore configuration:
The target cluster must have additional WASB credential configurations to access WASB storage containers. For more information, see Configuring Access to WASB.
Google cloud storage
When you set up Google cloud as your target cloud cluster, use the following Hive metastore configuration:
The target cluster must have additional Google cloud storage credential configurations to access Google cloud buckets.
Add and save the following configurations in
fs.gs.auth.service.account.email=email id of gcs service account
fs.gs.auth.service.account.private.key.id=private key id of gcs service
fs.gs.auth.service.account.private.key=private key of gcs service
The values for these configurations can be found in the
that you downloaded while registering the Google cloud storage credentials with the
For more information, see Registering Google Cloud Account.