Intro to Spark Structured Streaming using Scala and Apache Kafka

Intro to Spark Structured Streaming using Scala and Apache Kafka

Workshop date & duration: February 1st, 2020, 9:30 – 14:00, 30 min break included
Trainer: Valentina Crisan, Maria Catana
Location:  eSolutions Academy, Budişteanu Office Building, strada General Constantin Budişteanu Nr. 28C, etaj 1, Sector 1, Bucureşti
Price: 150 RON (including VAT)
Number of places: 10 no more left
Languages: Scala & SQL

Description:

Starting with Spark 2.0 structured streaming processing was introduced, modeling the stream as an unbounded/infinite table –  a big architectural change if we look at the batch model (Dstream) that existed prior to Spark 2.0. The workshop will introduce you into how Spark can read, process & analyze streams of data –  we will use stream data from Apache Kafka and Scala & SQL for reading/processing/analyzing the data. We will discuss as well stateless vs stateful queries and how Spark handles out of order data in case of aggregation queries.

It would be good, but not mandatory, if you already know Apache Kafka concepts. The workshop will not go through Apache Kafka architecture/concepts, will just resume some of the main points to be known when reading Kafka data into Spark ( what means topic, partition, offset, brokers, producers/consumers). 

Prior basic knowledge of Spark would be good, we will not discuss Spark architecture and concepts in the workshop.

 

Agenda:
  • Stream processing architecture concepts
      • Existing solutions and main differences
      • Events bus/messaging bus (Apache Kafka) – main concepts (offsets handling)
      • Event processing vs window processing architectures
      • state/stateless queries in stream processing
  • Spark structured streaming
    • Architecture
      • Internal table => queries => result table => output modes
      • Trigger Interval vs Continuous processing (pls note continuous processing is still experimental, we will mainly work with trigger interval)
      • writeStream options:
        • File
        • Kafka
        • Table 
        • Console
      • How state is handled
      • watermarking
    • Exercises:
      • How to read data from a Kafka stream in Spark 
        • Using Spark.read
        • Using Spark.readStream
        • Run a query
      • How to write streaming data from Spark 
        • Write streaming data in a table and run SQL queries on it
        • Write streaming data in a file
        • Write streaming data back into Kafka
      • Aggregation queries
      • Handling out of order data

The price for the workshop is 150 RON (including VAT).

The registration is closed. Complete next form if you want to be notified when a new session will be scheduled: