Storm relies on the acking mechanism to replay tuples in case of failures. It is possible that the state is committed but the worker crashes before acking the tuples. In this case the tuples are replayed causing duplicate state updates.
StatefulBoltExecutor continues to process the tuples from a stream
after it has received a checkpoint tuple on one stream while waiting for checkpoint to
arrive on other input streams for saving the state. This can also cause duplicate state
updates during recovery.
The state abstraction does not eliminate duplicate evaluations and currently provides only at-least once guarantee.
To provide the at-least-once guarantee, all bolts in a stateful topology are
expected to anchor the tuples while emitting and ack the input tuples once it is
processed. For non-stateful bolts, the anchoring and acking can be automatically
managed by extending the
BaseBasicBolt. Stateful bolts are expected to
anchor tuples while emitting and ack the tuple after processing like in the
WordCountBolt example in the State management subsection.