Tuning Guide
Also available as:
PDF

Parser Tuning Example

We'll be using the Bro sensor in this example. The parsers and PCAP use a builder utility, as opposed to enrichments and indexing, which use Flux.

The following example of parser tuning starts with a single partition for the inbound Kafka topics and eventually increases to 48 partitions.

  1. In the storm-bro.config file, set the topology.max.spout pending value to 2000.

      {
          ...
          "topology.max.spout.pending" : 2000
          ...
      }
      

    The default is null which would result in no limit.

  2. In the spout-bro.config file, set the following settings to use the default values:

      {
          ...
      
          "spout.pollTimeoutMs" : 200,
          "spout.maxUncommittedOffsets" : 10000000,
          "spout.offsetCommitPeriodMs" : 30000
      }
    
    

    Because we are using the default settings, you can optionally omit these settings.

  3. Run the Bro parser topology with the following options:

     $METRON_HOME/bin/start_parser_topology.sh \
        -e ~metron/.storm/storm-bro.config \
        -esc ~/.storm/spout-bro.config \
        -k $BROKERLIST \
        -ksp SASL_PLAINTEXT \
        -nw 1 \
        -ot enrichments \
        -pnt 24 \
        -pp 24 \
        -s bro \
        -snt 24 \
        -sp 24 \
        -z $ZOOKEEPER \
     

    This example does not fully match the number of Kafka partitions with the parallelism in this case, though you could do so if necessary. Notice that the example only needs one worker.

From the usage docs, here are the options used. The full reference can be found here Parsers Readme.

usage: start_parser_topology.sh
 -e,--extra_topology_options <JSON_FILE>               Extra options in the form
                                                       of a JSON file with a map
                                                       for content.
 -esc,--extra_kafka_spout_config <JSON_FILE>           Extra spout config options
                                                       in the form of a JSON file
                                                       with a map for content.
                                                       Possible keys are:
                                                       retryDelayMaxMs,retryDelay
                                                       Multiplier,retryInitialDel
                                                       ayMs,stateUpdateIntervalMs
                                                       ,bufferSizeBytes,fetchMaxW
                                                       ait,fetchSizeBytes,maxOffs
                                                       etBehind,metricsTimeBucket
                                                       SizeInSecs,socketTimeoutMs
 -k,--kafka <BROKER_URL>                               Kafka Broker URL
 -ksp,--kafka_security_protocol <SECURITY_PROTOCOL>    Kafka Security Protocol
 -nw,--num_workers <NUM_WORKERS>                       Number of Workers
 -ot,--output_topic <KAFKA_TOPIC>                      Output Kafka Topic
 -pnt,--parser_num_tasks <NUM_TASKS>                   Parser Num Tasks
 -pp,--parser_p <PARALLELISM_HINT>                     Parser Parallelism Hint
 -s,--sensor <SENSOR_TYPE>                             Sensor Type
 -snt,--spout_num_tasks <NUM_TASKS>                    Spout Num Tasks
 -sp,--spout_p <SPOUT_PARALLELISM_HINT>                Spout Parallelism Hint
 -z,--zk <ZK_QUORUM>                                   ZooKeeper Quroum URL
                                                       (zk1:2181,zk2:2181,...