Tuning Guide
Also available as:
PDF

Parser Tuning

We'll be using the Bro sensor in this example.

[Note]Note

The parsers and PCAP use a builder utility, as opposed to enrichments and indexing, which use Flux.

We started with a single partition for the inbound Kafka topics and eventually worked our way up to 48 partitions. And we're using the following pending value, as shown below. The default is 'null' which would result in no limit.

storm-bro.config

  {
      ...
      "topology.max.spout.pending" : 2000
      ...
  }
  

And the following default spout settings. Again, this can be omitted entirely since we are using the defaults.

spout-bro.config

  {
      ...
  
      "spout.pollTimeoutMs" : 200,
      "spout.maxUncommittedOffsets" : 10000000,
      "spout.offsetCommitPeriodMs" : 30000
  }
  

And we ran our Bro parser topology with the following options. We did not need to fully match the number of Kafka partitions with our parallelism in this case, though you could certainly do so if necessary. Notice that we only needed 1 worker.

 /usr/metron/0.4.0/bin/start_parser_topology.sh -k $BROKERLIST -z $ZOOKEEPER -s bro -ksp SASL_PLAINTEXT
     -ot enrichments
     -e ~metron/.storm/storm-bro.config \
     -esc ~/.storm/spout-bro.config \
     -sp 24 \
     -snt 24 \
     -nw 1 \
     -pnt 24 \
     -pp 24 \
 

From the usage docs, here are the options we've used. The full reference can be found here Parsers Readme.

  -e,--extra_topology_options (JSON_FILE)          Extra options in the form
                                                   of a JSON file with a map
                                                   for content.
    -esc,--extra_kafka_spout_config (JSON_FILE)    Extra spout config options
                                                   in the form of a JSON file
                                                   with a map for content.
                                                   Possible keys are:
                                                   retryDelayMaxMs,retryDelay
                                                   Multiplier,retryInitialDel
                                                   ayMs,stateUpdateIntervalMs
                                                   ,bufferSizeBytes,fetchMaxW
                                                   ait,fetchSizeBytes,maxOffs
                                                   etBehind,metricsTimeBucket
                                                   SizeInSecs,socketTimeoutMs
      -sp,--spout_p (SPOUT_PARALLELISM_HINT)       Spout Parallelism Hint
        -snt,--spout_num_tasks (NUM_TASKS)         Spout Num Tasks
          -nw,--num_workers (NUM_WORKERS)          Number of Workers
            -pnt,--parser_num_tasks (NUM_TASKS)    Parser Num Tasks
              -pp,--parser_p (PARALLELISM_HINT)    Parser Parallelism Hint