Data ingest using Apache NiFi

Data ingest using Apache NiFi

Course date duration: August 8th, 2020, 9:30 – 14:00, 30 min break included
Trainer: Lucian Neghina
Location: Online (using Zoom)
Price: 150 RON (including VAT)
Number of places: 10 3 places left
When you need to design a data solution one of the earliest questions is where your data is coming from and how you will make it available to the solution/solutions that processes/stores the data. Especially since data we might deal with IoT data, thus various sources, and data will be as well processed and stored by several components of your solution. Even more nowadays that we work mainly with streams not with static data such a solution that is able to design and run the flow of events from the source/sources to the processing/storage stage it’s extremely important.  Apache NiFi has been built to automate that data flow from one system to another. Apache NiFi is a data flow management system that comes with a web UI that helps to build data flows in real time. It supports flow-based programming.

You can check out the agenda and register here.

Understanding joins with Apache Spark

Understanding joins with Apache Spark

Workshop date & duration: June 20, 2020, 9:30 – 14:00, 30 min break included
TrainerValentina CrisanMaria Catana
Location: Online
Price: 150 RON (including VAT)
Number of places: 10
Languages: Scala & SQL

DESCRIPTION:

For a (mainly) in memory processing platform like Spark – getting the best performance is most of the time about:

  1. Optimizing the amount of data needed in order to perform a certain action   
  2. Having a partitioning strategy that distributed optimally the data to the Spark Cluster executors (this is many times correlated to the underlying storage data distribution for initial data distribution, but as well is related to how data is partitioned during the join itself, given that before running the actual join operation, the partitions are first sorted)
  3. And in case your action is a join one – choosing the right strategy for your joins 

This workshop will mainly focus on two of the above mentioned steps: partitioning and join strategy, making these aspects more clear through exercises and hands on sessions.

You can check out the agenda and register here.

ETL with Apache Spark

ETL WITH APACHE SPARK

Workshop date & duration: March 28th, 2020, 9:30 – 14:00, 30 min break included
TrainerValentina CrisanMaria Catana
Location:  eSolutions Academy, Budişteanu Office Building, strada General Constantin Budişteanu Nr. 28C, etaj 1, Sector 1, Bucureşti
Price: 150 RON (including VAT)
Number of places: 10 no more places left
Languages: Scala & SQL

DESCRIPTION:

One of the many uses of Apache Spark is to transform data from different formats and sources, both batch and streaming data. In this workshop that will be mainly hands on we will focus on just that: understanding how we can read/write/transform/manage schema/join different formats of data and how is best to handle those data when it comes to Apache Spark. So, if you know a bit about Spark but did not manage to play too much with its ETL capabilities or even if you don’t know too much but would like to find out – this workshop might be of interest.

You can check out the agenda and register here.