Tune Enrichment Core Storm Settings
You can set the number of Kafka spouts to match the number of Kafka partitions. You can also increase the number of workers and ackers to match the Storm nodes, unless the estimated throughput for the parser is very low.
Set the parser Storm settings using the enrichment.properties file:
##### Storm ##### enrichment.workers=3 enrichment.acker.executors=3 topology.worker.childopts= topology.auto-credentials= topology.max-spout.pending= ... kafka.start=LATEST ... ##### Parallelism ##### kafka.spout.parallelism=9
Kafka Offset Strategyto
LATESTto allow the Kafka topic to be written to continuously during testing so when the parser is restarted, the topology will not be flooded with events.
Alternatively, you can set the
Kafka Offset Strategyto
EARLIESTto determine the maximum throughput of the topology, though you should set
Max Spout Pendingto avoid errors..
kafka.writer.parallelismvalues in increments based on the number of workers.For example, in the previous example, the parameters could be incremented by 3.NoteThe other parameters not mentioned above which are listed below do not affect the Storm topology in any way. These values were utilized in the older Enrichments topology and so can be set to null.
##### Parallelism ##### kafka.spout.parallelism=9 enrichment.split.parallelism= enrichment.stellar.parallelism= enrichment.join.parallelism=18 threat.intel.split.parallelism= threat.intel.stellar.parallelism= threat-intel.join-parallelism-18 kafka.writer.parallelism=9
As you increase the
kafka.writer.parallelismvalues, check the two Storm statistics, Parser Capacity and the number of tuples acked in a 10-minute window.For a given estimated throughput, the capacity should be no greater than ~0.800. This will allow for ~20% overhead if the number of incoming events spike above the estimated average. If the capacity is above this level, you should increment the Parallelism and Num Tasks values and restart the topology.The number of acked tuples should be approximately equal to (Desired Throughput ×600) assuming the topology has been active for at least 11 - 12 minutes. If the number of acked tuples and the capacity of the topology are both low, there might not be enough Kafka partitions.If the Storm UI is showing a capacity of ~0.800 or less, you should monitor the Kafka consumer to ensure that there is no significant lag or buildup of messages for the parser. The command below shows an example of how you can monitor via the command line on a Kafka node:
cd /usr/hdp/current/kafka-broker/bin/ watch -n 2 ./kafka-run-class.sh kafka.tools.ConsumerOffsetChecker --zookeeper master01:2181 --topic enrichments --group enrichmentss