Penalization vs. Yielding
When an issue occurs during processing, the framework exposes two methods to allow Processor developers to avoid performing unnecessary work: "penalization" and "yielding." These two concepts can become confusing for developers new to the NiFi API. A developer is able to penalize a FlowFile by calling the
penalize(FlowFile) method of ProcessSession. This causes the FlowFile itself to be inaccessible to downstream Processors for a period of time. The amount of time that the FlowFile is inaccessible is determined by the DataFlow Manager by setting the "Penalty Duration" setting in the Processor Configuration dialog. The default value is 30 seconds. Typically, this is done when a Processor determines that the data cannot be processed due to environmental reasons that are expected to sort themselves out. A great example of this is the PutSFTP processor, which will penalize a FlowFile if a file already exists on the SFTP server that has the same filename. In this case, the Processor penalizes the FlowFile and routes it to failure. A DataFlow Manager can then route failure back to the same PutSFTP Processor. This way, if a file exists with the same filename, the Processor will not attempt to send the file again for 30 seconds (or whatever period the DFM has configured the Processor to use). In the meantime, it is able to continue to process other FlowFiles.
On the other hand, yielding allows a Processor developer to indicate to the framework that it will not be able to perform any useful function for some period of time. This commonly happens with a Processor that is communicating with a remote resource. If the Processor cannot connect to the remote resource, or if the remote resource is expected to provide data but reports that it has none, the Processor should call
yield on the
ProcessContext object and then return. By doing this, the Processor is telling the framework that it should not waste resources triggering this Processor to run, because there's nothing that it can do - it's better to use those resources to allow other Processors to run.