Hortonworks Cybersecurity Platform
Also available as:
PDF
loading table of contents...

Configure an Extractor Configuration File

You use the extractor configuration file to bulk load the enrichment store into HBase.

  1. On the host on which Metron is installed, log in as root.
  2. Determine the schema of the enrichment source.
  3. Create an extractor configuration file called extractor_config_temp.json and populate it with the enrichment source schema:
    {
     "config" : {
        "columns" : {
            "domain" : 0
            ,"owner" : 1
            ,"home_country" : 2
            ,"registrar": 3
            ,"domain_created_timestamp": 4
        }
        ,"indicator_column" : "domain"
        ,"type" : "whois"
        ,"separator" : ","
      }
      ,"extractor" : "CSV"
    }
    
  4. Transform and filter the enrichment data as it is loaded into HBase by using Stellar extractor properties in the extractor configuration file.
    HCP supports the following Stellar extractor properties:
    value_transform

    Transforms fields defined in the columns mapping with Stellar transformations. New keys introduced in the transform are added to the key metadata:

    "value_transform" : {
       "domain" : "DOMAIN_REMOVE_TLD(domain)"
    value_filter

    Allows additional filtering with Stellar predicates based on results from the value transformations. In the following example, records whose domain property is empty after removing the TLD are omitted:

    "value_filter" : "LENGTH(domain) > 0",
      "indicator_column" : "domain",
    indicator_transform

    Transforms the indicator column independent of the value transformations. You can refer to the original indicator value by using indicator as the variable name, as shown in the following example:

    "indicator_transform" : {
       "indicator" : "DOMAIN_REMOVE_TLD(indicator)"

    In addition, if you prefer to piggyback your transformations, you can refer to the variable domain, which allows your indicator transforms to inherit transformations to this value.

    indicator_filter

    Allows additional filtering with Stellar predicates based on results from the value transformations. In the following example, records with empty indicator values after removing the TLD are omitted:

    "indicator_filter" : "LENGTH(indicator) > 0",
      "type" : "top_domains",
    state_init
    Allows a state object to be initialized. This is a string, so a single expression is created. The output of this expression will be available as the state variable. Use this property with the flatfile_summarizer.sh rather than the loader.
    state_update
    Allows a state object to be updated. This is a map, so you can have temporary variables here. Note that you can reference the state variable from this. Use this property with the flatfile_summarizer.sh rather than the loader.
    state_merge
    Allows a list of states to be merged. This is a string, so a single expression. There is a special field called states available, which is a list of the states (one per thread). Use this property with the flatfile_summarizer.sh rather than the loader.
    Including all of the supported Stellar extractor properties in the extractor configuration file, looks similar to the following:
    {
       "config" : {
         "zk_quorum" : "$ZOOKEEPER_HOST:2181",
         "columns" : {
            "rank" : 0,
            "domain" : 1
         },
         "value_transform" : {
            "domain" : "DOMAIN_REMOVE_TLD(domain)"
         },
         "value_filter" : "LENGTH(domain) > 0",
         "indicator_column" : "domain",
         "indicator_transform" : {
            "indicator" : "DOMAIN_REMOVE_TLD(indicator)"
         },
         "indicator_filter" : "LENGTH(indicator) > 0",
         "type" : "top_domains",
         "separator" : ","
       },
       "extractor" : "CSV"
     }
    If you run a file import with this data and extractor configuration, you get the following two extracted data records:
    Indicator Type Value
    google top_domains { "rank" : "1", "domain" : "google" }
    yahoo top_domains { "rank" : "2", "domain" : "yahoo" }
  5. To access properties that reside in the global configuration file, provide a ZooKeeper quorum by using the zk_quorum property.
    If the global configuration looks like "global_property" : "metron-ftw", enter the following to expand the value_transform:
    "value_transform" : {
        "domain" : "DOMAIN_REMOVE_TLD(domain)",
         "a-new-prop" : "global_property"
     },
    The resulting value data looks like the following:
    Indicator Type Value
    google top_domains { "rank" : "1", "domain" : "google", "a-new-prop" : "metron-ftw" }
    yahoo top_domains { "rank" : "2", "domain" : "yahoo", "a-new-prop" : "metron-ftw" }
  6. Remove any non-ASCII invisible characters included when you cut and pasted the value_transform information:
    iconv -c -f utf-8 -t ascii extractor_config_temp.json -o extractor_config.json
    The extractor_config.json file is not stored by the loader. If you want to reuse it, you must store it yourself.