Component Tuning Levers
Also available as:
PDF

Modifying Unified Topology Enrichment Properties Using Ambari

You can modify various enrichment properties for the unified topology using Ambari.

To modify the enrichment properties, navigate to Ambari>Metron>Enrichment.

The following list provides tuning guidelines for the unified topology enrichment properties you can modify in Ambari:

enrichment.workers
The number of worker processes for the enrichment topology. Increase parallelism before attempting to increase the number of workers.
Start by tuning only a single worker. Maximize throughput for that worker, then increase the number of workers.
The throughput should scale relatively linearly as workers are added. This reaches a limit as the number of workers running on a single node saturate the resources available.
When this happens, adding workers, but on additional nodes should allow further scaling.
enrichment.acker.executors
The number of ackers within the topology.
This should most often be equal to the number of workers defined in enrichment.workers.
Within the Storm UI, click the "Show System Stats" button. This will display a bolt named __acker. If the capacity of this bolt is too high, then increase the number of ackers.
topology.worker.childopts
This parameter accepts arguments that will be passed to the JVM created for each Storm worker. This allows for control over the heap size, garbage collection, and any other JVM-specific parameter.
Start with a 2G heap and increase as needed. Running with 8G was found to be beneficial, but will vary depending on caching needs.
-Xms8g -Xmx8g
The Garbage First Garbage Collector (G1GC) is recommended along with a cap on the amount of time spent in garbage collection. This is intended to help address small object allocation issues due to our extensive use of caches.
-XX:+UseG1GC -XX:MaxGCPauseMillis=100
If the caches in use are very large (as defined by either enrichment.join.cache.size or threat.intel.join.cache.size) and performance is poor, turning on garbage collection logging might be helpful.
topology.max.spout.pending
This limits the number of unacked tuples that the spout can introduce into the topology.
Decreasing this value will increase back pressure and allow the topology to consume messages at a pace that is maintainable.
If the spout throws 'Commit Failed Exceptions' then the topology is not keeping up. Decreasing this value is one way to ensure that messages can be processed before they time out.
If the topology's throughput is unsteady and inconsistent, decrease this value. This should help the topology consume messages at a maintainable pace.
If the bolt capacity is low, the topology can handle additional load. Increase this value so that more tuples are introduced into the topology which should increase the bolt capacity.
kafka.spout.parallelism
The parallelism of the Kafka spout within the topology. Defines the maximum number of executors for each worker dedicated to running the spout.
The spout parallelism should most often be set to the number of partitions of the input Kafka topic.dd
If the enrichment bolt capacity is low, increasing the parallelism of the spout can introduce additional load on the topology.
enrichment.parallelism
The parallelism hint for the enrichment bolt. Defines the maximum number of executors within each worker dedicated to running the enrichment bolt.
If the capacity of the enrichment bolt is high, increasing the parallelism will introduce additional executors to bring the bolt capacity down.
If the throughput of the topology is too low, increase this value. This allows additional tuples to be enriched in parallel.
Increasing parallelism on the enrichment bolt will at some point put pressure on the downstream threat intel and output bolts. As this value is increased, monitor the capacity of the downstream bolts to ensure that they do not become a bottleneck.
threat.intel.parallelism
The parallelism hint for the threat intel bolt. Defines the maximum number of executors within each worker dedicated to running the threat intel bolt.
If the capacity of the threat intel bolt is high, increasing the parallelism will introduce additional executors to bring the bolt capacity down.
If the throughput of the topology is too low, increase this value. This allows additional tuples to be enriched in parallel.
Increasing parallelism on this bolt will at some point put pressure on the downstream output bolt. As this value is increased, monitor the capacity of the output bolt to ensure that it does not become a bottleneck.
kafka.writer.parallelism
The parallelism hint for the output bolt which writes to the output Kafka topic. Defines the maximum number of executors within each worker dedicated to running the output bolt.
If the capacity of the output bolt is high, increasing the parallelism will introduce additional executors to bring the bolt capacity down.
enrichment.cache.size
The Enrichment bolt maintains a cache so that if the same enrichment occurs repetitively, the value can be retrieved from the cache instead of it being recomputed. Increase the size of the cache to improve the rate of cache hits.
There is a great deal of repetition in network telemetry, which leads to a great deal of repetition for the enrichments that operate on that telemetry. Having a highly performant cache is one of the most critical factors driving performance.
Increasing the size of the cache may require that you increase the worker heap size using `topology.worker.childopts'.
threat.intel.cache.size
The Threat Intel bolt maintains a cache so that if the same enrichment occurs repetitively, the value can be retrieved from the cache instead of it being recomputed.
There is a great deal of repetition in network telemetry, which leads to a great deal of repetition for the enrichments that operate on that telemetry. Having a highly performant cache is one of the most critical factors driving performance.
Increase the size of the cache to improve the rate of cache hits.
Increasing the size of the cache may require that you increase the worker heap size using `topology.worker.childopts'.
enrichment.threadpool.size
This value defines the number of threads maintained within a pool to execute each enrichment. This value can either be a fixed number or it can be a multiple of the number of cores (5C = 5 times the number of cores).
The enrichment bolt maintains a static thread pool that is used to execute each enrichment. This thread pool is shared by all of the executors running within the same worker.
Start with a thread pool size of 1. Adjust this value after tuning all other parameters first. Only increase this value if testing shows performance improvements in your environment given your workload.
If the thread pool size is too large this will cause the work to be shuffled amongst multiple CPU cores, which significantly decreases performance. Using a smaller thread pool helps pin work to a single core.
If the thread pool size is too small this can negatively impact IO-intensive workloads. Increasing the thread pool size, helps when using IO-intensive workloads with a significant cache miss rate. A thread pool size of 3-5 can help in these cases.
Most workloads will make significant use of the cache and so 1-2 threads will most likely be optimal.
The bolt uses a static thread pool. To scale out, but keep the work mostly pinned to a CPU core, add more Storm workers while keeping the thread pool size low.
If a larger thread pool increases load on the system, but decreases the throughput, then it is likely that the system is thrashing. In this case the thread pool size should be decreased.
enrichment.threadpool.type
The enrichment bolt maintains a static thread pool that is used to execute each enrichment. This thread pool is shared by all of the executors running within the same worker.
Defines the type of thread pool used. This value can be either "FIXED" or "WORK_STEALING".
Currently, this value must be manually defined within the flux file at $METRON_HOME/flux/enrichment/remote-unified.yaml. This value cannot be altered within Ambari.