Also available as:
loading table of contents...

Persistent Provenance Repository Properties



The location of the Provenance Repository. The default value is ./provenance_repository.

NOTE: Multiple provenance repositories can be specified by using the prefix with unique suffixes and separate paths as values.

For example, to provide two additional locations to act as part of the provenance repository, a user could also specify additional properties with keys of:

Providing three total locations, including

The maximum amount of time to keep data provenance information. The default value is 24 hours.

The maximum amount of data provenance information to store at a time. The default is 1 GB.


The amount of time to wait before rolling over the latest data provenance information so that it is available in the User Interface. The default value is 30 secs.


The amount of information to roll over at a time. The default value is 100 MB.


The number of threads to use for Provenance Repository queries. The default value is 2.


The number of threads to use for indexing Provenance events so that they are searchable. The default value is 1. For flows that operate on a very high number of FlowFiles, the indexing of Provenance events could become a bottleneck. If this is the case, a bulletin will appear, indicating that "The rate of the dataflow is exceeding the provenance recording rate. Slowing down flow to accommodate." If this happens, increasing the value of this property may increase the rate at which the Provenance Repository is able to process these records, resulting in better overall throughput.


Indicates whether to compress the provenance information when rolling it over. The default value is true.


If set to true, any change to the repository will be synchronized to the disk, meaning that NiFi will ask the operating system not to cache the information. This is very expensive and can significantly reduce NiFi performance. However, if it is false, there could be the potential for data loss if either there is a sudden power loss or the operating system crashes. The default value is false.


The number of journal files that should be used to serialize Provenance Event data. Increasing this value will allow more tasks to simultaneously update the repository but will result in more expensive merging of the journal files later. This value should ideally be equal to the number of threads that are expected to update the repository simultaneously, but 16 tends to work well in must environments. The default value is 16.


This is a comma-separated list of the fields that should be indexed and made searchable. Fields that are not indexed will not be searchable. Valid fields are: EventType, FlowFileUUID, Filename, TransitURI, ProcessorID, AlternateIdentifierURI, Relationship, Details. The default value is: EventType, FlowFileUUID, Filename, ProcessorID.


This is a comma-separated list of FlowFile Attributes that should be indexed and made searchable. It is blank by default. But some good examples to consider are filename, uuid, and mime.type as well as any custom attritubes you might use which are valuable for your use case.


Large values for the shard size will result in more Java heap usage when searching the Provenance Repository but should provide better performance. The default value is 500 MB.


Indicates the maximum length that a FlowFile attribute can be when retrieving a Provenance Event from the repository. If the length of any attribute exceeds this value, it will be truncated when the event is retrieved. The default is 65536.