Embedded ZooKeeper Server
As mentioned above, the default State Provider for cluster-wide state is the
ZooKeeperStateProvider. At the time of this writing, this is the only State Provider that exists for handling cluster-wide state. What this means is that NiFi has dependencies on ZooKeeper in order to behave as a cluster. However, there are many environments in which NiFi is deployed where there is no existing ZooKeeper ensemble being maintained. In order to avoid the burden of forcing administrators to also maintain a separate ZooKeeper instance, NiFi provides the option of starting an embedded ZooKeeper server.
Specifies whether or not this instance of NiFi should run an embedded ZooKeeper server
Properties file that provides the ZooKeeper properties to use if
This can be accomplished by setting the
nifi.state.management.embedded.zookeeper.start property in nifi.properties to
true on those nodes that should run the embedded ZooKeeper server. Generally, it is advisable to run ZooKeeper on either 3 or 5 nodes. Running on fewer than 3 nodes provides less durability in the face of failure. Running on more than 5 nodes generally produces more network traffic than is necessary. Additionally, running ZooKeeper on 4 nodes provides no more benefit than running on 3 nodes, ZooKeeper requires a majority of nodes be active in order to function. However, it is up to the administrator to determine the number of nodes most appropriate to the particular deployment of NiFi.
nifi.state.management.embedded.zookeeper.start property is set to
nifi.state.management.embedded.zookeeper.properties property in nifi.properties also becomes relevant. This specifies the ZooKeeper properties file to use. At a minimum, this properties file needs to be populated with the list of ZooKeeper servers. The servers are specified as properties in the form of
server.n. Each of these servers is configured as <hostname>:<quorum port>[:<leader election port>]. For example,
myhost:2888:3888. This list of nodes should be the same nodes in the NiFi cluster that have the
nifi.state.management.embedded.zookeeper.start property set to
true. Also note that because ZooKeeper will be listening on these ports, the firewall may need to be configured to open these ports for incoming traffic, at least between nodes in the cluster. Additionally, the port to listen on for client connections must be opened in the firewall. The default value for this is
2181 but can be configured via the clientPort property in the zookeeper.properties file.
When using an embedded ZooKeeper, the ./conf/zookeeper.properties file has a property named
dataDir. By default, this value is set to
./state/zookeeper. If more than one NiFi node is running an embedded ZooKeeper, it is important to tell the server which one it is. This is accomplished by creating a file named myid and placing it in ZooKeeper's data directory. The contents of this file should be the index of the server as specific by the
server.<number>. So for one of the ZooKeeper servers, we will accomplish this by performing the following commands:
cd $NIFI_HOME mkdir state mkdir state/zookeeper echo 1 > state/zookeeper/myid
For the next NiFi Node that will run ZooKeeper, we can accomplish this by performing the following commands:
cd $NIFI_HOME mkdir state mkdir state/zookeeper echo 2 > state/zookeeper/myid
And so on.
For more information on the properties used to administer ZooKeeper, see the https://zookeeper.apache.org/doc/current/zookeeperAdmin.html.