Back To Top

How to build an Open Source Apache NiFi Cluster with Apache ZooKeeper Failover Controller

– Ansari Faheem Ahmed

We need to create a minimum of three Apache ZooKeeper nodes and three nodes for Apache NiFi. If you want to increment the apache NiFi node then you can do it, but to achieve the failover in case one apache NiFi primary node goes down, then in this case with the help of ZooKeeper failover controller suggest or set the other node as a primary in apache NiFi cluster.

Deployment and setting up the Apache Zookeeper Cluster:

1) Create three nodes and download the any version of zookeeper tar-ball from below link:
a. wget https://archive.apache.org/dist/zookeeper/zookeeper-3.4.13/zookeeper- 3.4.13.tar.gz
b. mv zookeeper-3.4.13.tar.gz /opt/
c. tar -xvzf zookeeper-3.4.13.tar.gz

2) Need to set the following configuration files in zookeeper conf.
a. Go to /opt/zookeeper-3.4.13/conf/
b. cp zoo_sample.cfg zoo.cfg
c. Set the below zoo.cfg file on all three nodes of zookeeper vim zoo.cfg >> tickTime=2000 dataDir=/var/lib/zookeeper
clientPort=2181 initLimit=5 syncLimit=2
server.1=zookeeper_IP_addres1:2888:3888 server.2= zookeeper_IP_addres2:2888:3888 server.3= zookeeper_IP_addres3:2888:3888
Note: If internal firewall is running then we have to add the port numbers (2181, 2888 and 3888) with tcp protocol on all three zookeeper nodes.
firewall-cmd –permanent –add-port=2181/tcp firewall-cmd –permanent –add-port=2888/tcp firewall-cmd –permanent –add-port=3888/tcp
>> firewall-cmd –reload
>> firewall-cmd –list-all
d. Create the zookeeper directory under the path /var/lib/
e. Create the java.env file on all three nodes under the path /opt/zookeeper-3.4.13/ vim java.env >> add these parameter “export JVMFLAGS=”-Xmx4096m – Xms512m””
f. Create the myid file on all the three nodes of zookeeper under the path
/var/lib/zookeeper/ and assign the zookeeper id such as 1, 2 and 3.
g. Zookeeper node 1: vim /var/lib/zookeeper/myid >> 1
h. Zookeeper node 2: vim /var/lib/zookeeper/myid >> 2
i. Zookeeper node 3: vim /var/lib/zookeeper/myid >> 3

j. Set the 777 permission on myid file on all the three zookeeper nodes
chmod 777 /var/lib/zookeeper/myid

3) Need to setup the zookeeper home directory using below command
a. export ZOOKEEPER_HOME=/opt/zookeeper-3.4.13
b. exec bash

4) Need to install Java and set the home directory path
a. yum install java-1.8.0-openjdk
b. export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.191.b12- 1.el7_6.x86_64/jre/bin/java
c. exec bash
5) Set the above conf and add firewall port (if internal firewall is open) on all the three zookeeper nodes.
6) When you start the zookeeper cluster first time then use below command to start the zookeeper cluster:
a. nohup java -cp /opt/zookeeper-3.4.13/zookeeper-3.4.13.jar:/opt/zookeeper- 3.4.13/lib/slf4j-api-1.7.25.jar:/opt/zookeeper-3.4.13/lib/slf4j-log4j12- 1.7.25.jar:/opt/zookeeper-3.4.13/lib/log4j-1.2.17.jar:conf org.apache.zookeeper.server.quorum.QuorumPeerMain /opt/zookeeper- 3.4.13/conf/zoo.cfg &
b. cd /opt/zookeeper-3.4.13/bin/
c. ./zkServer.sh restart “on all the three nodes”

Once the zookeeper cluster is working fine then it’s time to start the configuration of Apache Nifi>>>>>
1) Very first step to download the Apache nifi tar-ball on all nodes of nifi using below command:
wget https://archive.apache.org/dist/nifi/1.7.1/nifi-1.7.1-bin.tar.gz
2) Move the tar-ball in under path /opt/>> mv nifi-1.7.1-bin.tar.gz /opt/
3) Untar the nifi >> tar -xvzf nifi-1.7.1-bin.tar.gz
4) Time to set the configuration files in nifi >> cd /opt/nifi-1.7.1/conf/
5) Need to set the required conf properties in the following conf files on the nodes of apache nifi:-
nifi.properties, state-management.xml, bootstrap.conf and authorizers.xml and zookeeper.properties.

6) In nifi.properties node1 set the below property with nifi node1 IP and replicate the same config on other two nifi nodes with respect to their IP’s
# Site to Site properties
nifi.remote.input.host=Nifi Node1 Address

# web properties #
nifi.web.http.host= Nifi Node1 Address
nifi.web.http.port=8080 Note: port number is same on all the nifi nodes.
nifi.web.jetty.threads=400

# cluster node properties (only configure for cluster nodes) #
nifi.cluster.is.node=true nifi.cluster.node.address= Nifi Node1 Address nifi.cluster.node.protocol.port=12000 nifi.cluster.node.protocol.threads=20 nifi.cluster.node.connection.timeout=60 sec nifi.cluster.node.read.timeout=60 sec nifi.cluster.flow.election.max.wait.time=30 sec

# zookeeper properties, used for cluster management # nifi.zookeeper.connect.string=zookeeper1 IP:2181, zookeeper2 IP:2181,zookeeper3 IP:2181 nifi.zookeeper.connect.timeout=60 secs
nifi.zookeeper.session.timeout=60 secs

state-management.xml:-

<cluster-provider>
<id>zk-provider</id>
<class>org.apache.nifi.controller.state.providers.zookeeper.ZooKeeperStateProvider</class>
<property name=”Connect String”>zookeeperIP1:2181, zookIP2:2181, zookeeperIP3:2181</property>
<property name=”Root Node”>/nifi</property>
<property name=”Session Timeout”>10 seconds</property>
<property name=”Access Control”>Open</property>
</cluster-provider>

We need to enable the below tag in “authorizers.xml” below tag at the bottom the file.
<authorizer>
<identifier>file-provider</identifier>
<class>org.apache.nifi.authorization.FileAuthorizer</class>
<property name=”Authorizations File”>./conf/authorizations.xml</property>
<property name=”Users File”>./conf/users.xml</property>
<property name=”Initial Admin Identity”></property>
<property name=”Legacy Authorized Users File”></property>
<property name=”Node Identity 1″></property>
</authorizer>

bootstrap.conf:
In this file need to setup the java heap memory as per the workload. But 16 GB java heap is very good memory free big jobs in apache nifi.

______________________________________________________________________________________________________________

We at Abzooba a trusted AI partner for fortune 500 companies to Healthcare, Retail & BFSI giants in the US. We look forward to receiving your feedback/suggestions/expression of interest at pr@abzooba.com.

Speak to AI Expert