Run Book
Also available as:
PDF

Configuring an Extractor Configuration File

The extractor configuration file is used to bulk load the enrichment store into HBase.

Complete the following steps to configure the extractor configuration file:

  1. Log in as root to the host on which Metron is installed.

    sudo -s $METRON_HOME
  2. Determine the schema of the enrichment source.

    The schema of our mock enrichment source is domain|owner|registeredCountry|registeredTimestamp.

  3. Create an extractor configuration file called extractor_config.json at $METRON_HOME/config and populate it with the enrichment source schema.

    For example:

    {
     "config" : {
        "columns" : {
            "domain" : 0
            ,"owner" : 1
            ,"home_country" : 2
            ,"registrar": 3
            ,"domain_created_timestamp": 4
        }
        ,"indicator_column" : "domain"
        ,"type" : "whois"
        ,"separator" : ","
      }
      ,"extractor" : "CSV"
    }
    
  4. You can transform and filter the enrichment data as it is loaded into HBase by using Stellar extractor properties in the extractor configuration file. HCP supports the following Stellar extractor properties:

    Extractor PropertyDescriptionExample
    value_transform

    Transforms fields defined in the columns mapping with Stellar transformations. New keys introduced in the transform are added to the key metadata.

    "value_transform" : {
       "domain" : "DOMAIN_REMOVE_TLD(domain)"
    value_filter

    Allows additional filtering with Stellar predicates based on results from the value transformations. In the following example, records whose domain property is empty after removing the TLD are omitted.

    "value_filter" : "LENGTH(domain) > 0",
      "indicator_column" : "domain",
    indicator_transform

    Transforms the indicator column independent of the value transformations. You can refer to the original indicator value by using indicator as the variable name, as shown in the following example. In addition, if you prefer to piggyback your transformations, you can refer to the variable domain, which allows your indicator transforms to inherit transformations done to this value during the value transformations.

    "indicator_transform" : {
       "indicator" : "DOMAIN_REMOVE_TLD(indicator)"
    indicator_filter

    Allows additional filtering with Stellar predicates based on results from the value transformations. In the following example, records whose indicator value is empty after removing the TLD are omitted.

    "indicator_filter" : "LENGTH(indicator) > 0",
      "type" : "top_domains",

    If you include all of the supported Stellar extractor properties in the extractor configuration file, it will look similar to the following:

    {
     "config" : {
     "zk_quorum" : "$ZOOKEEPER_HOST:2181",
     "columns" : {
     "rank" : 0,
     "domain" : 1
     },
     "value_transform" : {
     "domain" : "DOMAIN_REMOVE_TLD(domain)"
     },
     "value_filter" : "LENGTH(domain) > 0",
     "indicator_column" : "domain",
     "indicator_transform" : {
     "indicator" : "DOMAIN_REMOVE_TLD(indicator)"
     },
     "indicator_filter" : "LENGTH(indicator) > 0",
     "type" : "top_domains",
     "separator" : ","
     },
     "extractor" : "CSV"
     }

    Running a file import with the above data and extractor configuration will result in the following two extracted data records:

    IndicatorTypeValue
    googletop_domains{ "rank" : "1", "domain" : "google" }
    yahootop_domains{ "rank" : "2", "domain" : "yahoo" }
  5. Remove any non-ASCII invisible characters that might have been included if you copy and pasted:

    iconv -c -f utf-8 -t ascii extractor_config_temp.json -o extractor_config.json
[Note]Note

The extractor_config.json file is not stored anywhere by the loader. This file is used once by the bulk loader to parse the enrichment dataset. If you would like to keep a copy of this file, be sure to save a copy to another location.