Apache Storm Component Guide
Also available as:
PDF
loading table of contents...

Implementing Stateful Windowing

The windowing implementation in core Storm acknowledges tuples in a window only when they fall out of the window.

For example, consider a window configuration with a window length of 5 minutes and a sliding interval of 1 minute. The tuples that arrived between 0 and 1 minutes are acked only when the window slides past one minute (for example, at the 6th minute).

If the system crashes, tuples e1 to e8 are replayed, assuming that the ack for e1 and e2 did not reach the acker. Tuples w1, w2 and w3 will be reevaluated.

Stateful windowing tries to minimize duplicate window evaluations by saving the last evaluated state and the last expired state of the window. Stateful windowing expects a monotonically increasing message ID to be part of the tuple, and uses the stateful abstractions discussed previously to save the last expired and last evaluated message IDs.

During recovery, Storm uses the last expired and last evaluated message IDs to avoid duplicate window evaluations:

  • Tuples with message IDs lower than the last expired ID are discarded.

  • Tuples with message IDs between the last expired and last evaluated message IDs are fed into the system without activating any triggers.

  • Tuples beyond the last evaluated message ids are processed as usual.

State support in windowing is provided by IStatefulWindowedBolt. User bolts should typically extend BaseStatefulWindowedBolt for windowings operation that use the Storm framework to automatically manage the state of the window.