Installing HDF Services on an Existing HDP Cluster
Also available as:
PDF

Configure NiFi for Atlas Integration

You can integrate NiFi with Apache Atlas to take advantage of robust dataset and application lineage support. You do this by configuring the NiFi ReportLineageToAtlas Reporting Task once you have NiFi configured and running.

If NiFi is installed on an HDP cluster, you must be running HDP 2.6.4 or later If NiFi is installed on an HDF cluster managed by a separate Ambari instance, you must be running HDP 2.6.1 or later, and Apache Atlas 0.8.0 or later.

  1. From the Global Menu located in NiFi’s upper right corner, select Controller Services and click the Reporting Tasks tab.
  2. Click the Add (+) icon to launch the Add Reporting Task dialog.
  3. Select ReportLineageToAtlas and click Add.
  4. Click the Edit icon to launch the Configure Reporting Task dialog. The following Properties are required:
    • Atlas URLs – a comma-separated list of Atlas Server URLs. Once you have started reporting, you cannot modify an existing Reporting Task to add a new Atlas Server. When you need to add a new Atlas Server, you must create a new reporting task.

    • Atlas Authentication Method – Specifies how to authenticate the Reporting Task to the Atlas Server. Basic authentication is the default.

    • NiFi URL for Atlas – Specifies the NiFi cluster URL

    • NiFi Lineage Strategy – Specifies the level of granularity for your NiFi dataflow reporting to Atlas. Once you have started reporting, you should not switch between simple and complete lineage reporting strategies.

    • Provenance Record Start Position – Specifies where in the Provenance Events stream the Reporting Task should start.

    • Provenance Record Batch Size – Specifies how many records you want to send in a single batch

    • Create Atlas Configuration File – If enabled, the atlas-application-properties file and the Atlas Configuration Directory are automatically created when the Reporting Task starts.

    • Kafka Security Protocol – Specifies the protocol used to communicate with Kafka brokers to send Atlas hook notification messages. This value should match Kafka's security.protocol property value.

Once you have ReportLineageToAtlas up and running, you may view dataset level lineage graphs in the Atlas UI.

Note
Note

The default time interval for the Reporting Task to start sending data to an Atlas Server is 5 minutes so do not expect to see immediate lineage graphs. You can change the default time interval in the Reporting Task property configuration.

More Information

For complete information, see the help included with the Reporting Task.