User Guide
Also available as:
PDF

Pattern-Based Anonymization Rules

Write pattern-based rules to anonymize data by pattern, using the extract pattern to extract content to anonymize.

Required and Optional Fields

  • name

  • rule_id (should be set to PATTERN)

  • patterns

  • extract (optional)

  • include_files (optional)

  • exclude_files (optional)

  • action (optional, default value is ANONYMIZE)

  • replace_value (optional, applicable only when action=REPLACE)

  • shared (optional, default value is true)

  • enabled (optional, default value is true)

For more information on each field, refer to Fields Used for Defining Anonymization Rules.

Rule Definition Example (without extract)

    {
      "name": "EMAIL",
      "rule_id": "Pattern",
      "patterns": ["(?<![a-z0-9._%+-])[a-z0-9._%+-]+@[a-z0-9.-]+\\.[a-z]{2,6}(?![a-z0-9._%+-])$?",
      "shared": false
    }

The content of the input file version.txt is:

Hadoop 2.7.3.2.5.0.0-1245
Subversion git@github.com:hortonworks/hadoop.git -r cb6e514b14fb60e9995e5ad9543315cd404b4e59
Compiled by jenkins on 2016-08-26T00:55Z

The content of the output file version.txt, with anonymized email address, is:

Hadoop 2.7.3.2.5.0.0-1245
Subversion ‡qpe@unqfay.mjp‡:hortonworks/hadoop.git -r cb6e514b14fb60e9995e5ad9543315cd404b4e59
Compiled by jenkins on 2016-08-26T00:55Z

Rule Definition Example (with extract)

    {
      "name": "KEYSTORE",
      "rule_id": "Pattern",
      "patterns": ["oozie.https.keystore.pass=([^\\s]*)", "OOZIE_HTTPS_KEYSTORE_PASS=([^\\s]*)"],
      "extract": "=([^\\s]*)",
      "include_files": ["java_process.txt", "pid.txt", "ambari-agent.log", "java_process.txt", "oozie-env.cmd"],
      "shared": false
    }

The content of the input file oozie-env.cmd is:

oozie.https.keystore.pass=abcde
set OOZIE_HTTPS_KEYSTORE_PASS=12345

To anonymize the content of the input file, the following anonymization patterns configured in the rule will be used:

"oozie.https.keystore.pass=([^\\s]*)", "OOZIE_HTTPS_KEYSTORE_PASS=([^\\s]*)"

oozie.https.keystore.pass=([^\\s]*) and OOZIE_HTTPS_KEYSTORE_PASS=([^\\s]*) match with oozie.https.keystore.pass=abcde and OOZIE_HTTPS_KEYSTORE_PASS=12345 respectively.

Next, the extract pattern "=([^\\s]*) is used to identify 12345 and abcde, which are the values to be anonymized.

The content of the output file oozie-env.cmd is:

oozie.https.keystore.pass=‡vvdwa‡
set OOZIE_HTTPS_KEYSTORE_PASS=‡zdowg‡

The values of oozie.https.keystore.pass and OOZIE_HTTPS_KEYSTORE_PASS have been anonymized.

For more examples, refer to Examples of Pattern-Based Anonymization Rules.