Streaming Enrichment Information
Streaming enrichment information is useful when you need enrichment information in real time. Our example steps through how to associate IP addresses with user names for the Squid information. This type of information is most useful in real time as opposed to waiting for a bulk load of the enrichment information.
Streaming intelligence feeds are incorporated slightly differently than bulk loading. The enrichment information resides in its own parser topology instead of an extraction configuration file. The parser file defines the input structure and how that data can be used in enrichment. Streaming information goes to HBase rather than to Kafka so you need to configure the writer by defining both the writerClassName and Simple HBase Enrichment Writer (shew) parameters.
The following steps illustrate how to associate IP addresses from Squid with user names.
Define a parser topology in
$METRON_HOME/zookeeper/parsers/user.json
to handle the streaming data:touch $METRON_HOME/config/zookeeper/parsers/user.json
Populate the file with the parser topology definition. For example:
{ "parserClassName" : "org.apache.metron.parsers.csv.CSVParser" ,"writerClassName" : "org.apache.metron.enrichment.writer.SimpleHbaseEnrichmentWriter" ,"sensorTopic":"user" ,"parserConfig": { "shew.table" : "enrichment" ,"shew.cf" : "t" ,"shew.keyColumns" : "ip" ,"shew.enrichmentType" : "user" ,"columns" : { "user" : 0 ,"ip" : 1 } } }
where
- parserClassName
The parser name.
- writerClassName
The writer destination. For streaming parsers, the destination is
SimpleHbaseEnrichmentWriter
.- sensorTopic
Name of the sensor topic.
- shew.table
The simple HBase enrichment writer (shew) table to which we want to write.
- shew.cf
The simple HBase enrichment writer (shew) column family.
- shew.keyColumns
The simple HBase enrichment writer (shew) key.
- shew.enrichmentType
The simple HBase enrichment writer (shew) enrichment type.
- columns
The CSV parser information. For our example, this information is the user name and IP address.
This file fully defines the input structure and how that data can be used in enrichment.
Push the configuration file to ZooKeeper:
Create a Kafka topic:
/usr/hdp/current/kafka-broker/bin/kafka-topics.sh --create --zookeeper $ZOOKEEPER_HOST:2181 --replication-factor 1 --partitions 1 --topic user
When you create the Kafka topic, consider how much data will be flowing into this topic.
Push the configuration file to ZooKeeper.
$METRON_HOME/bin/zk_load_configs.sh -m PUSH -z $ZOOKEEPER_HOST:2181 -i $METRON_HOME/zookeeper
Start the user parser topology by running the following:
$METRON_HOME/bin/start_parser_topology.sh -s user -z $ZOOKEEPER_HOST:2181 -k $KAKFA_HOST:6667
The parser topology listens for data streaming in and pushes the data to HBase. Now you have data flowing into the HBase table, but you need to ensure that the enrichment topology can be used to enrich the data flowing past.
Edit the new data source enrichment configuration at
$METRON_HOME/config/zookeeper/enrichments/squid
to associate theip_src_addr
with the user name for more user enrichment.{ "enrichment" : { "fieldMap" : { "hbaseEnrichment" : [ "ip_src_addr" ] }, "fieldToTypeMap" : { "ip_src_addr" : [ "user" ] }, "config" : { } }, "threatIntel" : { "fieldMap" : { }, "fieldToTypeMap" : { }, "config" : { }, "triageConfig" : { "riskLevelRules" : { }, "aggregator" : "MAX", "aggregationConfig" : { } } }, "configuration" : { } }
Push the new data source enrichment configuration to ZooKeeper:
$METRON_HOME/bin/zk_load_configs.sh -m PUSH -z $ZOOKEEPER_HOST:2181 -i $METRON_HOME/zookeeper