Hortonworks Cybersecurity Platform
Also available as:
PDF

Parser Tuning Example

We'll be using the Bro sensor in this parser tuning example.

We started with a single partition for the inbound Kafka topics and eventually worked our way up to 48 partitions. And we're using the following pending value, as shown below. The default is 'null' which would result in no limit.

storm-bro.config

  {
      ...
      "topology.max.spout.pending" : 2000
      ...
  }
  

And the following default spout settings. Again, this can be omitted entirely since we are using the defaults.

spout-bro.config

  {
      ...
  
      "spout.pollTimeoutMs" : 200,
      "spout.maxUncommittedOffsets" : 10000000,
      "spout.offsetCommitPeriodMs" : 30000
  }
  

And we ran our Bro parser topology with the following options. We did not need to fully match the number of Kafka partitions with our parallelism in this case, though you could certainly do so if necessary. Notice that we only needed 1 worker.

 /usr/metron/0.4.0/bin/start_parser_topology.sh -k $BROKERLIST -z $ZOOKEEPER -s bro -ksp SASL_PLAINTEXT
     -ot enrichments
     -e ~metron/.storm/storm-bro.config \
     -esc ~/.storm/spout-bro.config \
     -sp 24 \
     -snt 24 \
     -nw 1 \
     -pnt 24 \
     -pp 24 \
 

From the usage docs, here are the options we've used. The full reference can be found here Parsers Readme.

-e,--extra_topology_options (JSON_FILE)        Extra options in the form of a JSON file with a map
                                               for content.
-esc,--extra_kafka_spout_config (JSON_FILE)    Extra spout config options in the form of a JSON file
                                               with a map for content.
                                               Possible keys are:
                                               retryDelayMaxMs,retryDelay
                                               Multiplier,retryInitialDelayMs,stateUpdateIntervalMs,
                                               bufferSizeBytes,fetchMaxWait,fetchSizeBytes,maxOffset
                                               Behind,metricsTimeBucketSizeInSecs,socketTimeoutMs
-sp,--spout_p (SPOUT_PARALLELISM_HINT)         Spout Parallelism Hint
-snt,--spout_num_tasks (NUM_TASKS)             Spout Num Tasks
-nw,--num_workers (NUM_WORKERS)                Number of Workers
-pnt,--parser_num_tasks (NUM_TASKS)            Parser Num Tasks
-pp,--parser_p (PARALLELISM_HINT)              Parser Parallelism Hint