Apache Hadoop High Availability
Also available as:
loading table of contents...

HBase Cluster Topologies

  • A central source cluster might propagate changes out to multiple destination clusters, for failover or due to geographic distribution.

  • A source cluster might push changes to a destination cluster, which might also push its own changes back to the original cluster.

  • Many different low-latency clusters might push changes to one centralized cluster for backup or resource-intensive data analytics jobs. The processed data might then be replicated back to the low-latency clusters.

Multiple levels of replication may be chained together to suit your organization’s needs. The following diagram shows a hypothetical scenario for a complex cluster replication configuration. The arrows indicate the data paths.

Figure 4.1. Example of a Complex Cluster Replication Configuration

HBase replication borrows many concepts from the statement-based replication design used by MySQL. Instead of SQL statements, entire WALEdits, which consist of multiple cell inserts that come from Put and Delete operations on the clients, are replicated in order to maintain atomicity.