3. Timeline Consistency

With timeline consistency, HBase introduces a consistency definition that can be provided per get or scan read operation:

public enum Consistency {
     STRONG,
     TIMELINE
 }

Consistency.STRONG is the default consistency model provided by HBase. If a table has region replication set to 1, or has region replicas but the reads are done with time consistency enabled, the read is always performed by the primary regions. This preserves previous behavior and the client receives the latest data.

If a read is performed with Consistency.TIMELINE, then the read RPC will be sent to the primary Region Server first. After a short interval, such as the default setting of 10 milleseconds for the hbase.client.primaryCallTimeout.get property, the parallel RPC for secondary region replicas is sent if the primary region replica does not respond. HBase returns the result from the RPC that finishes first. If the response is from the primary region replica, the data is current. You can use the Result.isStale() API to determine the state of the resulting data:

  • If the result is from a primary region, Result.isStale() is set to false.

  • If the result is from a secondary region, Result.isStale() is set to true.

TIMELINE consistency as implemented by HBase differs from pure eventual consistency in the following respects:

  • Single homed and ordered updates: Whether region replication is enabled or not, on the write side, there is still only one defined replica, the primary, that can accept writes. This replica is responsible for ordering the edits and preventing conflicts. This guarantees that two different writes are not committed at the same time by different replicas, resulting in divergent data. With this approach, there is no need to do read-repair or last-timestamp-wins types of conflict resolution.

  • The secondary replicas also apply edits in the order that the primary committed them. Thus the secondaries contain a snapshot of the primary data at any point in time. This is similar to RDBMS replications and HBase multi-datacenter replication, but takes place in a single cluster.

  • On the read side, the client can detect whether the read is coming from up-to-date data or stale data. Also, the client can issue reads with different consistency requirements on a per-operation basis to ensure its own semantic guarantees.

  • The client might still read stale data if it receives data from one secondary replica first, followed by reads from another secondary replica. There is no stickiness to region replicas, nor is there a transaction ID-based guarantee. If required, this can be implemented later.

Memory Accounting

Secondary region replicas refer to data files in the primary region replica, but they have their own MemStores in HA Phase 2 and use block cache as well. However, secondary region replicas cannot flush data when there is memory pressure for their MemStores. They can only free up MemStore memory when the primary region does a flush and the flush is replicated to the secondary.

Because a Region Server can host primary replicas for some regions and secondaries for others, secondary replicas might generate extra flushes to primary regions in the same host. In extreme situations, there might be no memory for new writes from the primary, by way of write ahead log (WAL) replication.

To resolve this situation, the secondary replica is allowed to do a store file refresh, which is a file system list operation to pick up new files from the primary and possibly dropping its MemStore. This refresh will only be performed if the MemStore size of the biggest secondary region replica is at least hbase.region.replica.storefile.refresh.memstore.multiplier times bigger than the biggest MemStore of a primary replica. The default value for hbase.region.replica.storefile.refresh.memstore.multiplier is 4.

[Note]Note

If this operation is performed, the secondary replica might obtain partial row updates across column families because column families are flushed independently. Hortonworks recommends that you configure HBase to perform this operation infrequently.

You can disable this feature by setting the value to a large number, but this might cause replication to be blocked without resolution.

Secondary Replica Failover

When a secondary region replica first comes online, or after a secondary region fails over, it may have contain edits from its MemStore. The secondary replica must ensure that it accesses stale data that has been overwritten before serving requests after assignment. Therefore, the secondary waits until it detects a full flush cycle, consisting of start flush and commit flush, or a region open event replicated from the primary replica.

Until the flush cycle occurs, the secondary region replica rejects all read requests by way of an IOException with the following message:

The region's reads are disabled

Other replicas might be available to read, thus not causing any impact for the RPC with TIMELINE consistency.

To facilitate faster recovery, the secondary region triggers a flush request from the primary when it is opened. The configuration property hbase.region.replica.wait.for.primary.flush, which is enabled by default, can be used to disable this feature if needed.