Tuning Topologies
Also available as:
PDF

Tune Bulk Message Writing

The primary purpose of the Bulk Message Writing abstraction is to enable efficient writing to external components. Because most HCP installation include a variety of sensors with different volumes and velocities, different sensors need to be tuned differently.

  1. For high volume sensors, set batch sizes higher.
    Configure high volume sensors with higher batch sizes (1000+ is recommended). Use logging to verify these batches are filling up. The number of actual message written should match the batch size. Keep in mind that large batch sizes also require more memory to hold messages. Streaming engines like Storm limit how many messages can be processed at a time (the topology.max.spout.pending setting).
  2. For low volume sensors, set batch timeouts lower.
    Low volume sensors may take longer to fill up a batch, especially if the batch size is set higher. This can be undesirable because messages may stay cached for longer than necessary, consuming memory and increasing latency for that sensor type.
    A maxBatchTimeout is set at creation time and serves as the ceiling for a batch timeout. In Storm topologies, this value is set to 1/2 the tuple timeout setting to ensure messages are always flushed before their tuples timeout. After a batch is flushed, the batch timer is reset for that sensor type.
  3. Allocate threads appropriately.
    Each thread (executor in Storm) maintains its own message cache. Allocating too many threads will cause messages to be spread too thin across separate caches and batches won't fill up completely. This should be balanced with having enough threads to take advantage of any parallel write capability offered by the endpoint that's being written to.
  4. Watch for high write times.
    Use logging to evaluate write timing. Unusually high write times can indicate that an endpoint is not configured correctly or undersized.