Apache Hadoop High Availability
Also available as:
PDF
loading table of contents...

Spreading Queue Failover Load

When replication is active, a subset of RegionServers in the source cluster is responsible for shipping edits to the sink. This responsibility must be failed over like all other RegionServer functions if a process or node crashes. The following configuration settings are recommended for maintaining an even distribution of replication activity over the remaining live servers in the source cluster:

  • Set replication.source.maxretriesmultiplier to 300.

  • Set replication.source.sleepforretries to 1 (1 second). This value, combined with the value of replication.source.maxretriesmultiplier, causes the retry cycle to last about 5 minutes.

  • Set replication.sleep.before.failover to 30000 (30 seconds) in the source cluster site configuration.