Developing Apache Storm Applications
Also available as:
PDF

Trident Spouts

Trident defines three spout types that differ with respect to batch content, failure response, and support for exactly-once semantics:

Non-transactional spouts

Non-transactional spouts make no guarantees for the contents of each batch. As a result, processing may be at-most-once or at least once. It is not possible to achieve exactly-once processing when using non-transactional Trident spouts.

Transactional spouts

Transactional spouts support exactly-once processing in a Trident topology. They define success at the batch level, and have several important properties that allow them to accomplish this:

  1. Batches with a given transaction ID are always identical in terms of tuple content, even when replayed.
  2. Batch content never overlaps. A tuple can never be in more than one batch.
  3. Tuples are never skipped.

With transactional spouts, idempotent state updates are relatively easy: because batch transaction IDs are strongly ordered, the ID can be used to track data that has already been persisted. For example, if the current transaction ID is 5 and the data store contains a value for ID 5, the update can be safely skipped.

Opaque transactional spouts

Opaque transactional spouts define success at the tuple level. Opaque transactional spouts have the following properties:

  1. There is no guarantee that a batch for a particular transaction ID is always the same.
  2. Each tuple is successfully processed in exactly one batch, though it is possible for a tuple to fail in one batch and succeed in another.

The difference in focus between transactional and opaque transactional spouts—success at the batch level versus the tuple level, respectively—has key implications in terms of achieving exactly-once semantics when combining different spouts with different state types.