Data Governance
Also available as:
PDF
loading table of contents...

Mirroring Data (Falcon)

Mirroring data produces an exact copy of the data and keeps both copies synchronized. You can use Falcon to mirror HDFS directories or Hive tables and you can mirror between HDFS and Amazon S3 or Microsoft Azure. A whole database replication can be performed with Hive.

To mirror data with the Falcon web UI:

  1. Launch the Falcon web UI. If you are using Ambari:

    1. On the Services tab, select Falcon in the services list.

    2. At the top of the Falcon service page, click Quick Links, and then click Falcon Web UI.

  2. At the top of the Falcon web UI page, click Mirror.

    Figure 2.8. New Mirror Configuration Dialog


  3. On the New Mirror page, specify the following values:

    Table 2.4. Mirror Configuration Values

    Value

    Description

    Mirror Name

    Name of the mirror entity.

    Tags

    Metadata tagging. An example is provided in the UI.

    Mirror Type

    Select whether this is a File System or Hive catalog mirror type.

    Source

    Specify the location, name, and path of the cluster or Hive table that is to be mirrored, and specify if the mirroring job runs on the source cluster.

    Target

    Specify the location, name, and path where the mirrored cluster is stored, and specify if the mirroring job runs on the target cluster.

    Validity

    Specify the validity interval.

    Advanced Options

    Expand the Advanced Options section of the page to configure how often the target cluster is updated, throttle distcp operations, set a retry policy, and specify the ACL for the mirror entity.


  4. Click Next to view a summary of your mirror entity definition.

  5. If you are satisfied with the mirror entity definition, click Save.

  6. To verify that you successfully created the mirror entity, enter the mirror entity name in the Falcon web UI Search well and press Enter. If the mirror entity name appears in the search results, it was successfully created. See Search For and Manage Data Pipeline Entities.