Apache Hadoop High Availability
Also available as:
PDF
loading table of contents...

Verifying Replicated HBase Data

The VerifyReplication MapReduce job, which is included in HBase, performs a systematic comparison of replicated data between two different clusters. Run the VerifyReplication job on the master cluster, supplying it with the peer ID and table name to use for validation. You can limit the verification further by specifying a time range or specific column families. The job short name is verifyrep. To run the job, use a command like the following:

$ HADOOP_CLASSPATH=`${HBASE_HOME}/bin/hbase classpath`
"${HADOOP_HOME}/bin/hadoop" jar "${HBASE_HOME}/hbase-server-VERSION.jar"
verifyrep --starttime=<timestamp> --stoptime=<timestamp> --families=<myFam> <ID> <tableName>

The VerifyReplication command prints out GOODROWS and BADROWS counters to indicate rows that did and did not replicate correctly.