Adding Functionality to Apache NiFi
Also available as:
PDF
loading table of contents...

Cohesion and Reusability

For the sake of making a single, cohesive unit, developers are sometimes tempted to combine several functions into a single Processor. This is very true for the case when a Processor expects input data to be in format X so that the Processor can convert the data into format Y and send the newly-formatted data to some external service.

Taking this approach of formatting the data for a particular endpoint and then sending the data to that endpoint within the same Processor has several drawbacks:

  • The Processor becomes very complex, as it has to perform the data translation task as well as the task of sending the data to the remote service.

  • If the Processor is unable to communicate with the remote service, it will route the data to a failure Relationship. In this case, the Processor will be responsible to perform the data translation again. And if it fails again, the translation is done yet again.

  • If we have five different Processors that translate the incoming data into this new format before sending the data, we have a great deal of duplicated code. If the schema changes, for instance, many Processors must be updated.

  • This intermediate data is thrown away when the Processor finishes sending to the remote service. The intermediate data format may well be useful to other Processors.

In order to avoid these issues, and make Processors more reusable, a Processor should always stick to the principal of "do one thing and do it well." Such a Processor should be broken into two separate Processors: one to convert the data from Format X to Format Y, and another Processor to send data to the remote resource.