ETL with Apache Spark

ETL WITH APACHE SPARK

Workshop date & duration: March 28th, 2020, 9:30 – 14:00, 30 min break included
TrainerValentina CrisanMaria Catana
Location:  eSolutions Academy, Budişteanu Office Building, strada General Constantin Budişteanu Nr. 28C, etaj 1, Sector 1, Bucureşti
Price: 150 RON (including VAT)
Number of places: 10 no more places left
Languages: Scala & SQL

DESCRIPTION:

One of the many uses of Apache Spark is to transform data from different formats and sources, both batch and streaming data. In this workshop that will be mainly hands on we will focus on just that: understanding how we can read/write/transform/manage schema/join different formats of data and how is best to handle those data when it comes to Apache Spark. So, if you know a bit about Spark but did not manage to play too much with its ETL capabilities or even if you don’t know too much but would like to find out – this workshop might be of interest.

You can check out the agenda and register here.

Fast and Scalable: SpringBoot + SOLR

Fast and Scalable: SpringBoot + SOLR

Workshop date & duration: March 21st, 2020, 9:30 – 14:00, 30 min break included
Trainer: Oana Brezai
Location:  eSolutions Academy, Budişteanu Office Building, strada General Constantin Budişteanu Nr. 28C, etaj 1, Sector 1, Bucureşti
Price: 150 RON (including VAT)
Number of places: 10
Languages: Java

Description:

Probably you have heard so far of SOLR,  it’s only the open source search platform that powers the search and navigation features of many of the world’s largest internet sites (e.g. AOL, Apple, Netflix, ..). During this workshop, you will build (under my guidance) a basic web shop (catalog + search part) using SOLR and SpringBoot.

 

You can check out the agenda and register here.

Intro to Spark Structured Streaming using Scala and Apache Kafka

Intro to Spark Structured Streaming using Scala and Apache Kafka

Workshop date & duration: February 1st, 2020, 9:30 – 14:00, 30 min break included
Trainer: Valentina Crisan, Maria Catana
Location:  eSolutions Academy, Budişteanu Office Building, strada General Constantin Budişteanu Nr. 28C, etaj 1, Sector 1, Bucureşti
Price: 150 RON (including VAT)
Number of places: 10 no more left
Languages: Scala & SQL

Description:

Starting with Spark 2.0 structured streaming processing was introduced, modeling the stream as an unbounded/infinite table –  a big architectural change if we look at the batch model (Dstream) that existed prior to Spark 2.0. The workshop will introduce you into how Spark can read, process & analyze streams of data –  we will use stream data from Apache Kafka and Scala & SQL for reading/processing/analyzing the data. We will discuss as well stateless vs stateful queries and how Spark handles out of order data in case of aggregation queries.

 

You can check out the agenda and register here.

ML on Spark workshop

WORKSHOP MACHINE LEARNING WITH SPARK

Workshop date & duration:  November 5th – Tuesday, 14:00 – 18:00
Trainer: Sorin Peste
Supporting students: Alexandru Petrescu, Laurentiu Partas
Location: TechHub Bucuresti
Price: Free (upon approval by the organizer & trainer)
Number of places: 20 no more places left
Languages: Python

Description:
We are coming back in November with a new workshop on Machine Learning – this time with how to build a model using Spark ML logistic regression and gradient boosting.
So, come join us for an afternoon in which we will explore Apache Spark’s Machine Learning capabilities. We’ll be looking at using Spark to build a Credit Scoring model which estimates the probability of default for current and existing customers.

You can check out the agenda and register here.

ML Intro using Decision Trees and Random Forest

workshop ML intro using decision trees and random forest

Using Python, Jupyter and SKlearn

Course date duration: July 13th, 9:30 – 14:00, 30 min break included
Trainers: Tudor Lapusan
Location: Impact Hub Bucuresti, Timpuri Noi area
Price: 200 RON (including VAT)
Number of places: 20 no more places left
Languages: Python

Getting started with Machine Learning can seem a pretty hefty task to some people: understanding the algorithms, learning a bit of programming, deciding which libraries to use, getting some data to learn on, etc… But in reality if you’re actually setting your expectations right and willing to start small and learn step by step, learning the basics of ML it’s actually quite doable. After learning a bit it’s actually up to you to take your knowledge in the real world and apply and expand what you have learned.

This workshop aims to introduce you into ML world and to teach you how to solve classification and regression problems through the usage of decision trees and random forest algorithms. We will go from the theory to hands on in just a couple of hours aiming mostly to make you understand the main pipeline of an ML project, while of course learning a bit of ML:

    • Software and hardware requirements for a ML project
    • Common Python libraries for data analysis
    • Feature encoding and feature preprocessing
    • Exploratory Data Analysis (EDA)
    • Model validation
    • Model hyperparameter optimization
    • Tree based models for classification and regression:
      • Decision Tree
      • Random Forest
    • Repetitive model improvement

You can check out the agenda and register here.

Spark Structured Streaming vs Kafka Streams

workshop Spark Structured Streaming vs Kafka Streams

Date: TBD
Trainers: Felix Crisan, Valentina Crisan, Maria Catana
Location: TBD
Number of places: 20
Price: 150 RON (including VAT)

Streams processing can be solved at application level or cluster level (stream processing framework) and two of the existing solutions in these areas are Kafka Streams and Spark Structured Streaming, the former choosing a microservices approach by exposing an API and the later extending the well known Spark processing capabilities to structured streaming processing.

This workshop aims to discuss the major differences between the Kafka and Spark approach when it comes to streams processing: starting from the architecture, the functionalities, the limitations in both solutions, the possible use cases for both and some of the implementation details.

You can check out the agenda and register here.

Workshop Kafka Streams

workshop kafka streams

Date: 18 May, 9:00 – 13:30
Trainers: Felix Crisan, Valentina Crisan
Location: Adobe Romania , Anchor Plaza, Bulevardul Timișoara 26Z, București 061331
Number of places: 20 no more places left
Price: 150 RON (including VAT)

Streams processing is one of the most active topics in big data architecture discussions nowadays, with many open and proprietary solutions available on the market ( Apache Spark Streaming, Apache Storm, Apache Flink, Google DataFlow..). But starting with release 0.11.0.0 Apache Kafka as well introduced the capability to process the streams of data that flow through Kafka – thus understanding what you can do with Kafka Streams and how is different from other solutions in the market it’s key in knowing what to choose for your particular use case.

This workshop aims to cover the most important parts of Kafka streams: the concepts (streams, tables, handling state, interactive queries, .. ), the practicality (what can you do with it and what is the difference between the API and the KSQL server) and to explain what means building an application that uses Kafka Streams. We will be focusing on the stream processing part of Kafka, assuming that participants are already familiar with the basic concepts of Apache Kafka – the distributed messaging bus.

 

You can check out the agenda and register here.

Introduction to Neo4j

Description:

This workshop aims to cover various introductory topics in the area of graph databases, with a main focus on Neo4j. We will tackle subjects relating to data modelling, performance and scalability. We will then have a look at how this technology can be used to highlight valuable patterns within our data.

The workshop will be divided into three main parts: a presentation covering the theoretical aspects surrounding graph databases, a demo showcasing typical Neo4j usage and a hands-on lab activity.

Introduction TO Neo4j

Date:March 16th, 2019, 9:30 – 13:30
Trainer:  Calin Constantinov
Location: eSolutions Academy, Budişteanu Office Building, strada General Constantin Budişteanu Nr. 28C, etaj 1, Sector 1, Bucureşti.
Number of places: 15 no more places left
Price: 150 RON (including VAT)

You can check out the agenda and register here.

Introduction to Apache Kafka

Apache Kafka is positioning strongly lately as Kafka as a Platform, quite an evolution from the messaging bus build by LinkedIn in 2011. But what makes Apache Kafka market such a strong position in the big data architecture landscape: highly distributed infinite (theoretically at least) storage of data, streaming features and API, KSQL? In this workshop we will go through the main features of Apache Kafka and discuss its evolved position in a big data architecture through use cases and through a hands on session in which we will store data through producers API, retrieve data through consumers API, see how data is partitioned and replicated, we will process data stored in Kafka through Kafka streams using KSQL. This workshop is entry level and addresses anyone interested in understanding how to get started with Apache Kafka and the role this solution can play in a big data architecture

Date: October 20, 2018, 9:30-13:30
TrainersValentina CrisanFelix Crisan
Location: eSolutions Academy, Budişteanu Office Building, strada General Constantin Budişteanu Nr. 28C, etaj 1, Sector 1, Bucureşti.
Number of places: 15  no more places left
Price: 150 RON (including VAT)

You can check out the agenda and register for future session here.

Introduction to Apache Solr

This workshop addresses anyone interested in Search solutions, the workshop aim is to be a light intro in Search engines and especially Apache Solr. Apache Solr is one of the two main open source search engines existing today and it’s also the base for the search functionalities implemented in several big data platforms ( e.g. Datastax, Cloudera). Thus, understanding Solr will help you not only in working with the Apache version but as well have a starting point in several platforms that use Solr as base for their search functionalities.

Date: 30 June, 2018, 9:30-13:30
Trainers: Radu Gheorghe
Location: eSolutions Academy, Budişteanu Office Building, strada General Constantin Budişteanu Nr. 28C, etaj 1, Sector 1, Bucureşti.
Number of places:  15  10 places left
Price: 150 RON (including VAT)

You can check out the agenda and register for this session here.