Kafka Simplified: Your Fast Track to Remote Server Configuration
Deploying Apache Kafka on remote servers can be a breeze or a bit challenging depending on your approach. This tutorial will showcase how to leverage the power of Ansible to make the process super easy and lightning-fast. Additionally, we’ll delve into creating topics, producing, and consuming messages on the newly deployed Kafka setup. Assuming a foundational understanding of Kafka, I will also provide a brief overview for those who may require it.
Apache Kafka is a distributed event store and stream-processing platform. It is an open-source system developed by the Apache Software Foundation and written in Java and Scala.
Requirements:
- Root access to the remote servers.
- An ansible controller machine.
- Refer to my previous blog to get basic knowledge about Ansible.
- Ports 9092, 2181, 2888, 8001, 3888, 22.
Let’s Start 🤩:
Step 1: Download the collection for kafka deployment from ansible-galaxy.
ansible-galaxy collection install amitgujar.kafkaninja
Step 2: Create an inventory file in the given format. Increase the machine count if you have more than 3 remote servers.
[machine1]
35.154.183.177
[machine2]
13.233.93.60
[machine3]
13.232.209.6
[all:vars]
ansible_user=ubuntu
ansible_python_interpreter=/usr/bin/python3
[machine1:vars]
value=1
[machine2:vars]
value=2
[machine3:vars]
value=3
Step 3: Create a file named zookeeper.yml
- name: Zookeeper Install
hosts: all
become: true
vars_prompt:
- name: username
prompt: "Enter the username "
private: false
- name: servers
prompt: "Enter private ipv4s in comma seperated values "
private: false
- name: server_count
prompt: "How many servers you want to configure "
private: false
- name: zookeeper.properties
prompt: "Specify the location for zookeeper properties (optional) "
private: false
roles:
- amitgujar.kafkaninja.zookeeper
Let’s clarify our objectives. This playbook involves deploying Zookeeper on remote servers. While a single server suffices, this role is fully capable of establishing a zookeeper quorum setup as well.
ZooKeeper plays a critical role in a Kafka cluster by providing distributed coordination and synchronization services. It maintains the cluster’s metadata, manages leader elections, and enables consumers to track their consumption progress. It also stores offsets, ensures fault tolerance, and assists in broker discovery and dynamic configuration updates. Therefore it is highly important to set the zookeeper correctly before performing the kafka cluster setup.
Let’s discuss the variable values:
- username: server username, it’s recommended to keep the same username on all servers. If you are using ec2 then enter ubuntu.
- servers: private ipv4 of all 3 servers, separated by a comma.
- server_count: total no. of machines that you want to configure.
- zookeeper.properties: leave this blank if you don’t want to add any custom values to zookeeper servers. default config can be found in my GitHub repo, link will be provided at the end of this blog.
Step 4: Execute the playbook
Upon successful execution, you should get this output.
This setup allows you to install Zoonavigator as well. Zoonavigator is a GUI client to interact with the zookeeper cluster.
Step 5: Add these values as a connection string (zookeeper1:2181, zookeeper2:2181, zookeeper:2181) Each zookeeper represents a single instance of the server.
You should get this window upon successful connection.
Time to install Kafka 🙂:
Step 6: Create a new playbook.
- name: Kafka Install
hosts: all
become: true
vars_prompt:
- name: username
prompt: "Enter the username "
private: false
- name: server_properties
prompt: "Specify the custom server.properties file (optional) "
private: false
- name: partition_size
prompt: "Enter the partition size to create (e.g 2GB) "
private: false
roles:
- amitgujar.kafkaninja.kafka
Let’s discuss the variable's values:
- username: explained above 😅
- server_properties: Customize the given file according to your requirements but keep broker.id and listeners, zookeeper config in a given format, Ansible will do the rest of the work for you.
- partition_size: specify the size of the external disk that you have attached to the remote server.
# Keep broker id as 1 here, the ansible role will do all the magic
broker.id=1
# Keep the listener value as defined here, ansible role will do all the magic
advertised.listeners=PLAINTEXT://kafka1:9092
delete.topic.enable=true
log.dirs=/data/kafka
num.partitions=8
# keep this 3 only if you have 3 servers.
default.replication.factor=3
min.insync.replicas=2
log.retention.hours=168
log.segment.bytes=1073741824
log.retention.check.interval.ms=300000
############################# Zookeeper #############################
# root directory for all kafka znodes.
zookeeper.connect=zookeeper1:2181,zookeeper2:2181,zookeeper3:2181/kafka
zookeeper.connection.timeout.ms=6000
auto.create.topics.enable=true
Step 7: If you get this response on all three servers, then you have configured Kafka correctly.
nc -vz kafka1 9092
nc -vz kafka2 9092
nc -vz kafka3 9092
Step 8: In Zoonavigator you should see a new node named Kafka created automatically.
Yes, that’s it, It’s all you need for the overall setup. Isn’t that great 😎. In only 8 steps, and 10 minutes & your production-grade Kafka cluster is ready.
If you need more brokers, create a group in the inventory file and only execute the Kafka playbook. A Zookeeper quorum with 3 servers is enough for most of the workloads but if you need more I suggest keeping them up to 5 only.
Keywords:
Before we move forward with the so-called ACTION 😅, Let’s look at the basic terminologies.
- Server: This is the remote server on which the Kafka is deployed.
- Events: Messages or records.
- Producer: Client application that writes events to the Kafka cluster.
- Consumer: Client application that reads events written by the producer.
- Broker: Single node/server in Kafka cluster.
- Topic: A category/collection of events.
- Offset: Unique ID of the message.
- Partitions: Subsets of messages in a topic.
And Action…….
Now that we have the cluster ready let’s perform some actions on it. We will be starting with topic creation first.
cd $USER/kafka/bin
./kafka-topics.sh --create --topic logs --partitions 3 --bootstrap-server kafka1:9092
Describe Topic:
./kafka-topics.sh --describe --topic logs --bootstrap-server kafka1:9092
Write events:
This command will open the producer which allows you to write events.
./kafka-console-producer.sh --topic logs --bootstrap-server kafka1:9092
Consume events:
This command will open the consumer and display the live events that the producer is writing.
./kafka-console-consumer.sh --topic logs --bootstrap-server kafka1:9092 --from-beginning
Note: I have utilized a producer on server1 and a consumer on server2 to illustrate the successful operation of the Kafka cluster. It is important to note that executing both actions on separate servers is not a requirement.
Just a quick reminder that we use the hostname of broker1 on server2, and for security reasons, these servers communicate with private IPv4 addresses for real-time use cases. In our case, hostnames are kafka1/zookeeper1 as follows where the number denotes the number of remote servers.
Remember Zoonavigator..?
Use the following path, you will see the newly created topic here.
Conclusion 😎:
The “kafkaNinja” Ansible collection simplifies the process, allowing for quick deployment and robust configuration of Apache Kafka in just a few minutes. By following the steps outlined in this guide, you can establish a production-grade Kafka setup that is both resilient and scalable. Additionally, we also explored operations such as topic creation, message read/write on the cluster, and the use of ZooNavigator, which provides a nice dashboard for our Kafka cluster. I hope this guide has given you the insights and confidence to optimize your Kafka deployments on remote servers. Happy streaming!
Here is the GitHub link for those who want to contribute/explore behind the scenes.