Apache Hadoop High Availability
Also available as:
PDF
loading table of contents...

Managing and Configuring HBase Cluster Replication

Implementing HBase cluster replication enables you to achieve High Availability (HA). HBase supports replication across multiple clusters. This can help you setup HA and enable disaster recovery.

Manually Enable HBase replication

  1. Configure the source and destination clusters and ensure that you have HBase running in both clusters.

    HBase master and region servers in the source cluster must be able to communicate with the master and all region servers in the destination cluster.

  2. On both clusters, create tables with the same names and column families, so that the destination cluster stores the data that it receives in a logical location:

    hbase shell>create "t1","cf1"
  3. All hosts in the source and destination clusters should be reachable to each other. If both clusters use the same ZooKeeper cluster, you must use a different zookeeper.znode.parent, because they cannot write in the same folder.

  4. On the source cluster, in HBase shell, add the destination cluster as a peer:

    hbase shell>add_peer “us_east”,”hostname.of.zookeeper:2181:/path-to-hbase”
  5. On HDP, path-to-hbase is either “/hbase-unsecure” or “/hbase-secure”. On the destination cluster, open the hbase-site.xml file and look at the value of zookeeper.znode.parent to find out the HBase directory.

  6. Ensure that replication has not been disabled. Ensure that the hbase.replication setting is set to true.

  7. On the source cluster, in HBase shell, enable the table replication:

    Run

    hbase shell>enable_table_replication "t1"
  8. Copy the HBase data from the source cluster to the destination cluster:

    $>hbase org.apache.hadoop.hbase.mapreduce.CopyTable --peer.adr=hostname.of.zookeeper:2181:/hbase-unsecure t1

Pause and stop HBase replication

  1. To pause HBase cluster replication, use the disable_table_replication command:

    Run

    hbase shell>disable_table_replication "t1"

    With this, you can temporarily stop replication. To re-enable the replication, use the enable_table_replication command.

  2. To permanently disable replication, remove the replication relationship:

    Run

    hbase shell>remove_peer “us_east”

Table 4.4. Table of HBase Cluster Management Commands and Descriptions

Command

Description

add_peer <ID> <CLUSTER_KEY>

Adds a replication relationship between two clusters:

  • ID: A unique string, which must not contain a hyphen.

  • CLUSTER_KEY: Composed using the following format:

    hbase.zookeeper.quorum:hbase.zookeeper. property.clientPort:zookeeper.znode.parent

list_peers

Lists all replication relationships known by the cluster.

enable_peer <ID>

Enables a previously-disabled replication relationship.

disable_peer <ID>

Disables a replication relationship. After disabling, HBase no longer sends edits to that peer cluster, but continues to track the new WALs that are required for replication to commence again if it is re-enabled.

remove_peer <ID>

Disables and removes a replication relationship. After removal, HBase no longer sends edits to that peer cluster nor does it track WALs.

enable_table_replication <TABLE_NAME>

Enables the table replication switch for all of the column families associated with that table. If the table is not found in the destination cluster, one is created with the same name and column families.

disable_table_replication <TABLE_NAME>

Disables the table replication switch for all of the column families associated with that table.