workshop Spark Structured Streaming vs Kafka Streams
Number of places: 20
Streams processing can be solved at application level or cluster level (stream processing framework) and two of the existing solutions in these areas are Kafka Streams and Spark Structured Streaming, the former choosing a microservices approach by exposing an API and the later extending the well known Spark processing capabilities to structured streaming processing.
This workshop aims to discuss the major differences between the Kafka and Spark approach when it comes to streams processing: starting from the architecture, the functionalities, the limitations in both solutions, the possible use cases for both and some of the implementation details.
The workshop assumes that you are already familiar with Kafka as a messaging bus and basic concepts of stream processing and that you are already familiar with Spark architecture. We will not enter in details regarding these solutions capabilities we will only focus on the Stream DSL API/KSQL Server for Kafka and Spark structured Streaming.
The workshop will have two parts: Spark Structured Streaming theory and hands on (using Zeppelin notebooks) and then comparison with Kafka Streams.
- Stream processing basic concepts
- Spark Structured Streaming hands on (using Apache Zeppelin with Scala and Spark SQL)
- Triggers (when to check for new data)
- Output mode – update, append, complete
- State Store
- Out of order data / late data
- Batch vs streams (use batch for deriving schema for the stream)
- Kafka Streams short recap through KSQL
- Important aspects for both solutions:
- event driven vs micro-batching
- State Stores
- Out of Order Data
- application scalability
We will use Scala and SQL syntax for the hands on exercises, KSQL for Kafka Streams and Apache Zeppelin for Spark Structured Streaming.
The price for the workshop is 150 RON (including VAT).
1. Complete registration form if you want to be notified when this workshop will pe scheduled: