Open course 2021: Understanding Apache Kafka

Timeline & Duration: November 15 – 17,  9:30 – 17:30, online sessions. No local installations required – an online setup will be available for exercises/hands-on sessions for the duration of the course. 

Main trainer: Valentina Crisan

Location: Online (Zoom)

Price: 450 EUR (early bird 400EUR if payment is done until November 1st) 

Pre-requisites: knowledge of distributed systems, basic SQL syntax knowledge (for the KSQL part of the course)

Apache Kafka is one of the most used distributed data messaging hubs in big data architectures, companies like Netflix, Apple and Microsoft processing billions of messages through Kafka based clusters. Its simple architecture and the way it writes and reads data makes it really fast at persisting and making available billions of events. But what makes Kafka even more interesting is a number of services that can be deployed in conjunction with your Kafka cluster – Kafka Connect, KSQL, Schema Registry – services that transform Kafka from a distributed data bus into a distributed streams processing solution that makes it extremely easy to work and process flexible schema formats like for example Apache Avro. In this course we will start with Apache Kafka from the concepts and evolve through its architecture by adding components like Connect, Schema Registry and KSQL. While working with its ecosystem components we will as well lay the grounds of understanding what means processing live streams of data vs batch processing – we will work with persistent queries and understand what state means and how it will be handled in distributed systems such as Kafka and KSQL. 

This course is designed to ease participants into understanding Kafka and the other components in the Kafka ecosystem. During the 3 days of the course we will discuss concepts but as well the participants will get to work directly with a 3 nodes vanilla Apache Kafka cluster, connect the Kafka cluster with MySQL databases as sources and sinks through Kafka Connect, use Schema Registry in combination with Kafka Connect for conversion of the events format to be stored in Kafka and process the Kafka events through KSQL .    

So, if you are a solutions architect, product manager or just someone who would like to understand how Apache Kafka could fit in your path/solution, this course will answer both theoretical and practical questions. The trainer will help you understand the components of the Kafka ecosystem, the concepts and the way these components could be used in real life examples.   

Main topics for the course :

  • Intro Apache Kafka concepts
    • Why Apache Kafka & use cases
      • where in the Big Data architecture is Kafka positioned
      • ingestion layer overview
    • Apache Kafka Fundamentals
      • Topics 
        • Keys
        • Value
        • Partition
        • Offset
        • Timestamp & Timestamp Type
      • Producers: writing messages to Kafka
        • Overview, sending a message, configuring producers
        • Partitioning data
      • Consumers: reading from Kafka
        • Overview, subscribing to topics, offsets, configuring consumers
        • Consumer groups
          • Consumer group coordinator, group leader
          • What happens when a consumer fails
      • Brokers 
        • Replication of data
        • Write / Read Path
        • Consistency vs Availability in Kafka
    • Kafka Architecture
      • Role of Zookeeper in Kafka clusters (depending on Kafka version)
      • Data retention options
        • Compacted topics
    • Hands on: Kafka as a messaging bus using the Kafka consoles for topics, consumers and producers
    • Kafka Connect 
      • Role of the Connect in Kafka
      • Architecture of a cluster with Kafka Connect
      • Connect Kafka to MySQL using Kafka Connect
    • Hands on: Setting up source and sink connectors for MySQL
  • Streams processing, KSQL and Kafka Schema Registry:  
    • Stream Processing
      • Streams processing architectures concepts 
        • Streams vs tables
        • Persistent queries
        • State handling in distributed systems
      • Kafka streams architecture overview
    • KSQL overview
      • KSQL concepts
      • Hands on using Kafka SQL: 
        • create streams + tables
        • Operations with streams and tables: join stream & table
        • Handling different formats of data: JSON nested format, Avro using Schema Registry 
        • Window aggregations
    • Schema Registry overview 
    • Hands on: building connectors that transform as well the events format, using as well Schema Registry       

If you are interested in participating, please complete and submit below form.