Configuring Cloud Data Access
Also available as:
PDF

Configuring GCS storage locations

After configuring access to GCS via service account, you can optionally use a GCS bucket as a base storage location; this storage location is mainly for the Hive Warehouse Directory (used for storing the table data for managed tables).

Prerequisites

  • You must have an existing bucket. For instructions on how to create a bucket on GCS, refer to GCP documentation.
  • The service account that you configured must allow access to the bucket.

Steps

  1. When creating a cluster, on the Cloud Storage page in the advanced cluster wizard view, select Use existing GCS storage and enter information related to your GCS account, as described in the instructions for configuring access to GCP.
  2. Under Storage Locations, enable Configure Storage Locations by clicking the button.
  3. Provide your existing directory name under Base Storage Location.
    Note
    Note

    Make sure that the bucket already exists within the account.

  4. Under Path for Hive Warehouse Directory property (hive.metastore.warehouse.dir), Cloudbreak automatically suggests a location within the bucket. You may optionally update this path or select Do not configure.
Note
Note

This directory structure will be created in your specified bucket upon the first activity in Hive.