Tuning Guide
Also available as:
PDF

Storm Tuning

There are quite a few options you will be confronted with when tuning your Storm topologies and this is largely trial and error. As a general rule of thumb, we recommend starting with the defaults and smaller numbers in terms of parallelism while iteratively working up until the desired performance is achieved. You will find the offset lag tool indispensable while verifying your settings.

We won't go into a full discussion about Storm's architecture - see references section for more info - but there are some general rules of thumb that should be followed. First, it's important to understand the ways you can impact parallelism in a Storm topology.

  • num tasks

  • num executors (parallelism hint)

  • num workers

Tasks are instances of a given spout or bolt, executors are threads in a process, and workers are jvm processes. You'll want the number of tasks as a multiple of the number of executors, the number of executors as multiple of the number of workers, and the number of workers as a multiple of the number of machines. The main reason for this approach is that it will give a uniform distribution of work to each machine and jvm process. More often than not, your number of tasks will be equal to the number of executors, which is the default in Storm. Flux does not actually provide a way to independently set number of tasks, so for enrichments and indexing which use Flux, num tasks will always equal num executors.

You can change the number of workers via the property topology.workers.

Other Storm Settings

```
topology.max.spout.pending
```

This is the maximum number of tuples that can be in a field (for example, not yet acked) at any given time within your topology. You set this as a form of back pressure to ensure you don't flood your topology.

topology.ackers.executors

This specifies how many threads should be dedicated to tuple acking. We found that setting this equal to the number of partitions in your inbound Kafka topic worked well.

spout-config.json

{
    ...
    "spout.pollTimeoutMs" : 200,
    "spout.maxUncommittedOffsets" : 10000000,
    "spout.offsetCommitPeriodMs" : 30000
}

These are the spout recommended defaults from Storm and are currently the defaults provided in the Kafka spout itself. In fact, if you find the recommended defaults work fine for you, then you can omit these settings altogether.