Run Book
Also available as:

Running the Enrichment Loader

After the enrichment source and enrichment configuration are defined, you must run the loader to move the data from the enrichment source to the HCP enrichment store (HBase) and store the enrichment configuration in ZooKeeper.


There is a special configuration parameter to the Extractor config that is only considered during this loader:


This specifies how to consider the data. The two implementations are BY_LINE and org.apache.metron.dataloads.extractor.inputformat.WholeFileFormat.

The default is BY_LINE, which makes sense for a list of CSVs where each line indicates a unit of information which can be imported. However, if you are importing a set of STIX documents, then you want each document to be considered as input to the Extractor.

  1. Use the loader to move the enrichment source to the enrichment store in ZooKeeper.

    Perform the following from the location containing your extractor and enrichment configuration files and your enrichment source. In our example, this information is located at $METRON_HOME/config.

    $METRON_HOME/bin/ -n enrichment_config.json -i whois_ref.csv -t enrichment -c t -e 

    The parameters for the utility are as follows:

    Short CodeLong CodeRequiredDescription
    -h NoGenerate the help screen/set of options
    -e--extractor_configYesJSON document describing the extractor for this input data source
    -t--hbase_tableYesThe HBase table to import into
    -c--hbase_cfYesThe HBase table column family to import into
    -i--inputYesThe input data location on local disk. If this is a file, then that file will be loaded. If this is a directory, then the files will be loaded recursively under that directory.
    -l--log4jNoThe log4j properties file to load
    -n--enrichment_configNoThe JSON document describing the enrichments to configure. Unlike other loaders, this is run first if specified.

    HCP loads the enrichment data into Apache HBase and establishes a ZooKeeper mapping. The data is extracted using the extractor and configuration defined in the extractor_config.json file and populated into an HBase table called enrichment.

  2. Verify that the logs were properly ingested into HBase:

    hbase shell
    scan 'enrichment'
  3. Verify that the ZooKeeper enrichment tag was properly populated:

  4. Generate some data by using the Squid client to execute requests.

    1. Use ssh to access the host for Squid.

    2. Start Squid and navigate to /var/log/squid:

      sudo service squid start
      sudo su - 
      cd /var/log/squid
    3. Generate some data by entering the following: