Hortonworks Cybersecurity Platform
Also available as:

Configure an Extractor Configuration File

You use the extractor configuration file to bulk load the enrichment store into HBase.

  1. Log in as root to the host on which Metron is installed.
    sudo -s $METRON_HOME
  2. Determine the schema of the enrichment source.
    The schema of our mock enrichment source is domain|owner|registeredCountry|registeredTimestamp.
  3. Create an extractor configuration file called extractor_config_temp.json at $METRON_HOME/config and populate it with the threat intel source schema.
    HCP supports a subset of STIX messages for importation:
    STIX Type Specific Type Enrichment Type Name
    Address IPV_4_ADDR address:IPV_4_ADDR
    Address IPV_6_ADDR address:IPV_6_ADDR
    Address E_MAIL address:E_MAIL
    Address MAC address:MAC
    Domain FQDN domain:FQDN
    Hostname hostname
    The following example configures the STIX extractor to load from a series of STIX files, however we only want to bring in IPv4 addresses from the set of all possible addresses. Note that if no categories are specified for import, all are assumed. Also, only address and domain types allow filtering via stix_address_categories and stix_domain_categories config parameters.
      "config" : {
        "stix_address_categories" : "IPV_4_ADDR"
      ,"extractor" : "STIX"
  4. Remove any non-ASCII invisible characters that might have been included if you copy and pasted:
    iconv -c -f utf-8 -t ascii extractor_config_temp.json -o extractor_config.json
  5. OPTIONAL: You also have the ability to transform and threat intel data using Stellar as it is loaded into HBase. This feature is available to all extractor types.
    The following example provides a CSV list of top domains as an enrichment and filtering the value metadata, as well as the indicator column, with Stellar expressions:
      "config" : {
        "zk_quorum" : "node1:2181",
        "columns" : {
           "rank" : 0,
           "domain" : 1
        "value_transform" : {
           "domain" : "DOMAIN_REMOVE_TLD(domain)"
        "value_filter" : "LENGTH(domain) > 0",
        "indicator_column" : "domain",
        "indicator_transform" : {
           "indicator" : "DOMAIN_REMOVE_TLD(indicator)"
        "indicator_filter" : "LENGTH(indicator) > 0",
        "type" : "top_domains",
        "separator" : ","
      "extractor" : "CSV"