Working Group: Debunking Apache Kafka and its open ecosystem (closed)

Note: This workshop is now closed.

Learning a new solution or building an architecture for a specific use case is never easy, especially when you are trying to work alone on such an endeavour – thus this year we will continue with the working groups type of learning that we debuted in 2020. Last year, we learned about:

  • Spark Structured Streaming + NLP (completed)
  • Building live dashboards with Druid + Superset (completed)
  • Understanding Decision Trees (running until December)
  • And this year we’re close to completing the Streams processing with Apache Flink working group.

In most of these projects there was a common entity always present – Apache Kafka – thus we decided that we will continue our working groups with “Debunking Apache Kafka and its open ecosystem”. Besides Apache Kafka there are other open components in its ecosystem like Kafka Connect, ksqldb and Schema Registry, thus this working group will aim to build a cluster of Apache Kafka that will contain all/most the open components of Kafka ecosystem. We will start from a simple 3 nodes Apache Kafka cluster, understand what means adding/removing nodes, partitioning topics data, .. and then we will add new components like Kafka Connect, ksqldb and Schema Registry. We will build an entire solution that can store and process streaming events.

The working group aims to take place in May  – July 2021 and will bring together a team of 5-6 participants that will define the scope, select the data for testing the solution (open data), install the needed components, implement the needed flow.       

Details of the working group are listed below. If you are interested to participate in this group please register using the form at the bottom of the page. 

What will this working group mean:

A predefined topic: How to build a solution based on Apache Kafka and the open components of its ecosystem (Kafka Connect, ksqldb, Schema Registry, ..).

A group of 5-6 participants and one predefined driver per group – the scope of the driver is (besides being part of the group) to organize the groups and provide the cloud infrastructure needed for installing the studied solution;

5 online meetings every 2 weeks (thus a 10 weeks time window for each working group, we will use Google Hangouts/Zoom). The meetings will take place Monday-Friday, in the interval 6PM – 9PM;

Active participation/contribution from each participant, for example each participant will have to present in at least 2 of the meetings to the rest of the group;

Some study @ home between the sessions;

The fee for participating in these working groups is 100 Euro/participant and will cover the costs with cloud infrastructure and other tools/logistics costs for the group meetings. 

Driver of Apache Kafka & Ecosystem working group: Valentina Crisan