Replication of data on-premise to Amazon S3 in Hive
You must create a new data replication policy to replicate data from on-premise to Amazon S3. You must setup target cluster before commencing the replication process.
Before you create a new replication policy, you must register Amazon S3 cloud account with the DLP app. For more information, see Register cloud credentials. You must have Infra Admin or DLM Admin role to perform this set of tasks.
You can replicate data on-premise to Amazon S3 with a single cluster. The metastore must be running on the cloud. There is no requirement to run the HiveServer 2 on the cloud environment.
- Select Policies and click Add Policy. Select HIVE as the service in the Create Replication Policy page.
- Enter the replication policy name and description.
- Click SELECT SOURCE and select type and source cluster from the drop-down.
- Provide the data replication folder path and click SELECT DESTINATION.
- Select the destination type as S3 and Cloud Credential from the drop-down.
Provide a folder path
bucket_name/pathand click VALIDATE.
- Once the validation is successful, click SCHEDULE.
- Configure the job settings for the replication policy.
- Click ADVANCED SETTINGS to set up the policy queue.
Click CREATE POLICY.
The data replication process is enabled.
View job status from the policies page. Verify that the job starts and runs as expected.