Debunking Apache Kafka – open curse

Open Course: Understanding Apache Kafka  

Timeline & Duration: November 20th – December 15th, 4 X 5 hours online sessions, during 4 weeks (1 sessions/week on November 20 & 27, Dec 4 & 15, to be decided the actual hours). An online setup will be available for exercises/hands-on sessions for the duration of the course, but also local installations will be guided for the course project completion. 

Main trainer: Valentina Crisan

Location: Online (Zoom)

Price: 350 EUR (early bird 300EUR if payment is done until November 15th) 

Pre-requisites: knowledge of distributed systems, SQL syntax for the KSQL part of the course

Apache Kafka is one of the most used distributed data hubs in big data architectures, companies like Netflix, Apple and Microsoft processing billions of messages per day through Kafka based clusters. Its simple architecture and the way it writes and reads data makes it really fast at persisting and making available the billions of events. But what makes Kafka even more interesting is a number of services that can be deployed in conjunction with your Kafka cluster – Kafka Connect, KSQL, Schema Registry – services that transform Kafka from a distributed data bus into a distributed streams processing solution that makes it extremely easy to work with flexible schema formats like Apache Avro. In this course we will take Kafka from the concepts and evolve through its architecture and then build on top of it adding components la Connect, Schema Registry and KSQL. While working with its ecosystem components we will as well lay the grounds of understanding what means processing live streams of data vs batch processing – we will work with persistent queries and understand what state means and how will be handled in distributed systems such as  Kafka and KSQL. 

This course is designed to ease participants into understanding Kafka and the other components in the Kafka ecosystem. During 4 online sessions (during 5 weeks) we will discuss concepts but as well do some hands on – the participants will get a problem to be solved at the beginning of the course and based on the class theory and exercises will get to solve the problem by the end of the 4 sessions. The course is structured in 4 X 5-hours sessions – for update on theory and some hands on exercises – but some exercises and reading/studying will be needed in between the sessions (materials will be provided online).    

So, if you are a solutions architect, product manager or just someone who would like to understand how Apache Kafka could fit in your path/solution, this course will answer both theoretical and practical questions. The trainer is experienced in both courses and real life projects and will help you understand the components of the Kafka ecosystem, the concepts and the way these components could be used in real life examples.   

Main topics for the course :

  • Sessions 1 – 2 (Intro Apache Kafka concepts)
    • Why Apache Kafka & use cases
      • where in the Big Data architecture is Kafka positioned
      • ingestion layer overview
    • Apache Kafka Fundamentals
      • Topics 
        • Keys
        • Value
        • Partition
        • Offset
        • Timestamp & Timestamp Type
      • Producers: writing messages to Kafka
        • Overview, sending a message, configuring producers
        • Partitioning data
      • Consumers: reading from Kafka
        • Overview, subscribing to topics, offsets, configuring consumers
        • Consumer groups
          • Consumer group coordinator, group leader
          • What happens when a consumer fails
      • Brokers 
        • Replication of data
        • Write / Read Path
        • Consistency vs Availability in Kafka
    • Kafka Architecture
      • Role of Zookeeper in Kafka clusters (depending on Kafka version)
      • Data retention options
        • Compacted topics
    • Kafka Connect 
      • Role of the Connect in Kafka
      • Architecture of a cluster with Kafka Connect
      • Demo: Connect Kafka to MySQL using Kafka Connect
    • Hands on: Kafka as a messaging bus using the Kafka consoles for topics, consumers and producers
  • Sessions 3 – 4 (Streams processing, KSQL and Kafka Schema Registry):  
    • Stream Processing
      • Streams processing architectures concepts 
        • Streams vs tables
        • Persistent queries
        • State handling in distributed systems
      • Kafka streams architecture overview
    • KSQL overview
      • KSQL concepts
      • Hands on using Kafka SQL: 
        • create streams + tables
        • Operations with streams and tables: join stream & table
      • Handling different formats of data: JSON nested format, Avro using Schema Registry 
      • Window aggregations
      • Handling of late data
    • Schema Registry overview 

If you are interested in participating please complete & submit below form.