SmartSense Administration Guide
Also available as:
PDF
loading table of contents...

Configure Data Anonymization Rules

Anonymization rules define regular expressions to anonymize sensitive data (like IP addresses, Domain Names, etc.). Each rule uses JSON format to define what to match and the value to replace.

[Note]Note

Anonymization rule formats vary between different SmartSense versions. Make sure that you consult the documentation that matches your SmartSense version.

  1. To define regular expression-based rules, refer to the following sample:

      {
        "name":"ip_address",
        "path":null,
        "pattern": "[ :\\/]?[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}[ :\\/]?",
        "extract": "[ :\\/]?([0-9\\.]+)[ :\\/]?",
        "shared": true
      }

    Key reference:

    • name - The rule name.

    • path - An optional regular expression path of files on which to apply this rule (default is null means all files).

    • pattern - Regular expression to defined the pattern to match within the file.

    • extract - An optional regular expression to extract the data from the matched pattern. Each of the extracts will be marked as regular expression group.

    • shared - Flag to indicate which key to use for anonymization the (shared or private) key will use for masking. If the shared key is used, Hortonworks support team would be able to unmask data if needed for diagnostic purposes. For example, hostname and IP addresses for resolving issues on specific hosts or communication between hosts. Please note, unmasked data is not stored in Hortonworks repositories. It is discarded as soon as the analysis finishes.

    • value - An optional constant value to replace. Note that the value chosen should notbe matchable by the pattern specified above. For example, if the pattern is '.*dfs.datanode.*', the value should not contain 'dfs.datanode'. Also, note that if the value is specified, shared flag will be ignored.

  2. To use property-based rules, use the following example:

     {
        "name":"delete_oozie_jdbc_password",
        "path":"oozie-site.xml",
        "property": "oozie.service.JPAService.jdbc.password",
        "operation":"DELETE"
        "shared": false
       }
    • name - The rule name.

    • path - A regular expression path of files on which to apply this rule.

    • property - The name of a specific property within the matching files.

    • operation - It can be either DELETE or REPLACE. Default is REPLACE. If DELETE is specified, the property will be removed from the config file, and if REPLACE is specified, the property value will be replace by either constant value or masked value.

    • value - An optional value for the REPLACE operation. If not specified, a private or shared key is used to mask the data to replace.

    • enabled - Flag to enable/disable rule definition, default being true.

    • excludes - A set of path patterns to be excluded by the rule. For example: “excludes”: [“oozie-site.xml”, “core-site.xml”]

    • shared - Flag to allow anonymized data to be reversed by Hortonworks. If shared is true, anonymized data is reversible by Hortonworks, if false, that data cannot be reversed.

    [Note]Note

    Rules configured with shared = false cannot be unmasked by Hortonworks (and in some cases may become a roadblock for support case analysis.)