About requested events missing in Notification Log table
You must consider all these factors before setting the value of
metastore.event.db.listener.timetolive configuration parameter is
used to control the time for which an event will be kept in the database listener queue
or the backing RDBMS. Note that, if the configuration value is set too high, the number
of events in the queue will increase and can impact performance in terms of normal
operation. If the value is set too low, the events might get deleted from the queue,
before replication could read it and thus could cause incremental replication to fail.
In such a scenario, you must bootstrap the system again to get back to the consistent
The value of
metastore.event.db.listener.timetolive parameter must be
large enough to avoid cleaning up of events, when replication of previous events is in
progress. During the bootstrap phase, events added once bootstrap starts should be
present for the next incremental to succeed. As the bootstrap is performed for whole
database, it might consume more time. In case of incremental load, if the replication
frequency is too low, incremental load gets triggered that will have large number of
events to replicate. This may increase the time required to execute the load.
So, while setting the parameter value, the replication trigger frequency should be taken into consideration to avoid events getting cleaned up before replication finishes. The replication time depends on many factors like the amount of data to be replicated, bandwidth between the clusters, and number of objects like partitions, table, and functions present in the database. For replicating to a cloud-based cluster, the time taken is more as the file system operations takes longer in cloud file system than in HDFS.