Tune Parser Core Storm Settings
You can set the number of Kafka spouts to match the number of Kafka partitions. You can also increase the number of workers and ackers to match the Storm nodes, unless the estimated throughput for the parser is very low.
Set the parser Storm settings in the Management user interface.
You can add the following command to the Storm settings to test the parser:
spout.firstPollOffsetStrategy": "LATEST"The command allows the Kafka topic to be written to continuously during testing so when the parser is restarted, the topology will not be flooded with events.
Increase the Parser Parallelism and Parser Num
Tasks values in increments based on the number of workers.
For example, in the previous example, the parameters could be incremented by 3.
As you increase the Parser Parallelism and Parser
Num Tasks values, check two Storm statistics: Parser Capacity and the
number of tuples acked in a 10-minute window.
For a given estimated throughput, the capacity should be no greater than ~0.800. This will allow for ~20% overhead if the number of incoming events spike above the estimated average. If the capacity is above this level, Parallelism and Parser Num Tasks values should be incremented and the topology restarted.The number of acked tuples should be approximately equal to (Desired Throughput ×600) assuming the topology has been active for at least 11 - 12 minutes. If the number of acked tuples and the capacity of the topology are both low, there may not be enough Kafka partitions.If the Storm UI is showing a capacity of ~0.800 or less, monitor the Kafka consumer to ensure that there is no significant lag or buildup of messages for the parser. The following command shows an example of how you can monitor the Kafka consumer via the command line on a Kafka node:
cd /usr/hdp/current/kafka-broker/bin/ watch -n 2 ./kafka-run-class.sh kafka.tools.ConsumerOffsetChecker --zookeeper master01:2181 --topic asa --group asa_parser