Data ingest using Apache NiFi

Data ingest using Apache NiFi

Course date duration: August 8th, 2020, 9:30 – 14:00, 30 min break included
Trainer: Lucian Neghina
Location: Online (using Zoom)
Price: 150 RON (including VAT)
Number of places: 10 3 places left
When you need to design a data solution one of the earliest questions is where your data is coming from and how you will make it available to the solution/solutions that processes/stores the data. Especially since data we might deal with IoT data, thus various sources, and data will be as well processed and stored by several components of your solution. Even more nowadays that we work mainly with streams not with static data such a solution that is able to design and run the flow of events from the source/sources to the processing/storage stage it’s extremely important.  Apache NiFi has been built to automate that data flow from one system to another. Apache NiFi is a data flow management system that comes with a web UI that helps to build data flows in real time. It supports flow-based programming.

You can check out the agenda and register here.

Understanding joins with Apache Spark

Understanding joins with Apache Spark

Workshop date & duration: June 20, 2020, 9:30 – 14:00, 30 min break included
TrainerValentina CrisanMaria Catana
Location: Online
Price: 150 RON (including VAT)
Number of places: 10
Languages: Scala & SQL

DESCRIPTION:

For a (mainly) in memory processing platform like Spark – getting the best performance is most of the time about:

  1. Optimizing the amount of data needed in order to perform a certain action   
  2. Having a partitioning strategy that distributed optimally the data to the Spark Cluster executors (this is many times correlated to the underlying storage data distribution for initial data distribution, but as well is related to how data is partitioned during the join itself, given that before running the actual join operation, the partitions are first sorted)
  3. And in case your action is a join one – choosing the right strategy for your joins 

This workshop will mainly focus on two of the above mentioned steps: partitioning and join strategy, making these aspects more clear through exercises and hands on sessions.

You can check out the agenda and register here.

ETL with Apache Spark

ETL WITH APACHE SPARK

Workshop date & duration: March 28th, 2020, 9:30 – 14:00, 30 min break included
TrainerValentina CrisanMaria Catana
Location:  eSolutions Academy, Budişteanu Office Building, strada General Constantin Budişteanu Nr. 28C, etaj 1, Sector 1, Bucureşti
Price: 150 RON (including VAT)
Number of places: 10 no more places left
Languages: Scala & SQL

DESCRIPTION:

One of the many uses of Apache Spark is to transform data from different formats and sources, both batch and streaming data. In this workshop that will be mainly hands on we will focus on just that: understanding how we can read/write/transform/manage schema/join different formats of data and how is best to handle those data when it comes to Apache Spark. So, if you know a bit about Spark but did not manage to play too much with its ETL capabilities or even if you don’t know too much but would like to find out – this workshop might be of interest.

You can check out the agenda and register here.

Open course Big Data, September 25-28, 2019

Open course big data

Open Course: Big Data Architecture and Technology Concepts
Course duration: 3.5 days, September 25-28 (Wednesday-Friday 9:00 – 17:00, Saturday 9:30-13:00)
Trainers: Valentina Crisan, Felix Crisan
Location: Bucharest, TBD (location will be communicated to participants)
Price: 450 EUR, 10% discount early bird if registration is confirmed until 2nd of September – 405 EUR
Number of places: 10
Pre-requisites: knowledge of distributed systems, Hadoop ecosystem (HDFS, MapReduce), know a bit of SQL.

Description:

There are a few concepts and solutions that solutions architects should be aware of when evaluating or building a big data solution: what data partitioning means, how to model your data in order to get the best performance from your distributed system, what is the best format of your data, what is the best storage or the best way to analyze your data. Solutions like HDFS, Hive, Cassandra, Hbase, Spark, Kafka, YARN should be known – not necessarily because you will work specifically with them – but mainly because knowing the concepts of these solutions will help you understand other similar solutions in the big data space. This course is designed to make sure the participants will understand the usage and applicability of big data technologies like HDFS, Spark, Cassandra, Hbase, Kafka ,..  and which aspects to consider when starting to build a Big Data architecture.

Please see details for the course and registration here: https://bigdata.ro/open-course-big-data-september-25-28-2019/

Spark Structured Streaming vs Kafka Streams

workshop Spark Structured Streaming vs Kafka Streams

Date: TBD
Trainers: Felix Crisan, Valentina Crisan, Maria Catana
Location: TBD
Number of places: 20
Price: 150 RON (including VAT)

Streams processing can be solved at application level or cluster level (stream processing framework) and two of the existing solutions in these areas are Kafka Streams and Spark Structured Streaming, the former choosing a microservices approach by exposing an API and the later extending the well known Spark processing capabilities to structured streaming processing.

This workshop aims to discuss the major differences between the Kafka and Spark approach when it comes to streams processing: starting from the architecture, the functionalities, the limitations in both solutions, the possible use cases for both and some of the implementation details.

You can check out the agenda and register here.

Introduction to Apache Solr

This workshop addresses anyone interested in Search solutions, the workshop aim is to be a light intro in Search engines and especially Apache Solr. Apache Solr is one of the two main open source search engines existing today and it’s also the base for the search functionalities implemented in several big data platforms ( e.g. Datastax, Cloudera). Thus, understanding Solr will help you not only in working with the Apache version but as well have a starting point in several platforms that use Solr as base for their search functionalities.

Date: 30 June, 2018, 9:30-13:30
Trainers: Radu Gheorghe
Location: eSolutions Academy, Budişteanu Office Building, strada General Constantin Budişteanu Nr. 28C, etaj 1, Sector 1, Bucureşti.
Number of places:  15  10 places left
Price: 150 RON (including VAT)

You can check out the agenda and register for this session here.

Big Data Architecture intro workshop

This workshop is addressed to anyone interested in Big Data and the overall architectural components required to build a data solution. We will use Apache Zeppelin for some data exploration but otherwise the workshop will be more a theoretical one – allowing enough time to overall understand which are the possible components and their role in a Big Data Architecture. We will not go in depth in the components/solutions but the aim is to understand the overall role of possible components in architecting a big data solution.

The scope of this workshop is to make the participants familiar with the Big Data architecture components and has as prerequisite the overall understanding of IT architectures.

Date: February 24th, 2018, 9:00 – 13:00
Trainers: Felix Crisan, Valentina Crisan
Location: eSolutions Academy, Budişteanu Office Building, strada General Constantin Budişteanu Nr. 28C, etaj 1, Sector 1, Bucureşti.
Number of places:  15  no more places left
Price: 150 RON (including VAT)

Check out the agenda and register for this session here.

Modeling your data for analytics with Apache Cassandra and Spark SQL

This session is intended for those looking to understand better how to model data for queries in Apache Cassandra and Apache Cassandra + Spark SQL. The session will help you understand the concept of secondary indexes and materialized views in Cassandra and the way Spark SQL can be used in conjunction with Cassandra in order to be able to run complex analytical queries. We assume you are familiar with Cassandra & Spark SQL (but it’s not mandatory since we will explain the basic concepts behind data modeling in Cassandra and Spark SQL). The whole workshop will be run in Cassandra Query Language and SQL and we will use Zeppelin as the interface towards Cassandra + Spark SQL.

Date: 19 August, 9:00 – 13:30
Trainers: Felix Crisan, Valentina Crisan
Location: eSolutions Academy, Budişteanu Office Building, strada General Constantin Budişteanu Nr. 28C, etaj 1, Sector 1, Bucureşti.
Number of places:  15  8 left
Price: 150 RON (including VAT)

Check out the agenda and register for future session here.

Modeling your data for analytics with Apache Cassandra and Spark SQL

This session is intended for those looking to understand better how to model data for queries in Apache Cassandra and Apache Cassandra + Spark SQL. The session will help you understand the concept of secondary indexes and materialized views in Cassandra and the way Spark SQL can be used in conjunction with Cassandra in order to be able to run complex analytical queries. We assume you are familiar with Cassandra & Spark SQL (but it’s not mandatory since we will explain the basic concepts behind data modeling in Cassandra and Spark SQL). The whole workshop will be run in Cassandra Query Language and SQL and we will use Zeppelin as the interface towards Cassandra + Spark SQL.

Date: 10 June, 9:00 – 13:30 – this workshop will be rescheduled
Trainers: Felix Crisan, Valentina Crisan
Location: eSolutions Academy, Budişteanu Office Building, strada General Constantin Budişteanu Nr. 28C, etaj 1, Sector 1, Bucureşti.
Number of places:  15
Price: 150 RON (including VAT)

Check out the agenda and register for future session here.

Analytics with Cassandra and Spark SQL Workshop

We continue the series of Spark SQL and Cassandra with more hands on exercises on the integration between the 2 solutions, working on open Movielens data. This workshop addresses those who know the basics of Cassandra & CQL and have SQL knowledge. Spark is not mandatory, although would be good to know it’s basic concepts ( RDD, transformations, actions) since we will not address these concepts in the workshop but we will mention them in several occasions. Without Spark basic concepts you will still understand the aggregations that can be done at Spark SQL level but you will not fully understand how Spark SQL integrates in the whole Spark system.
 In this workshop you will understand the optimal way of making queries in a solution composed of Apache Cassandra and Apache Spark.
Prerequisites: Cassandra Concepts knowledge, SQL knowledge

Trainers: Felix Crisan, Valentina Crisan
When: 22 April
Time: 9:30-14:00
Location: eSolutions Academy, Budişteanu Office Building, strada General Constantin Budişteanu Nr. 28C, etaj 1, Sector 1, Bucureşti.
Number of places: 15 5 places left
Price: 125 RON  including  VAT

Check out the agenda and register here.