Past Events

Introduction to Apache Solr

This workshop addresses anyone interested in Search solutions, the workshop aim is to be a light intro in Search engines and especially Apache Solr. Apache Solr is one of the two main open source search engines existing today and it’s also the base for the search functionalities implemented in several big data platforms ( e.g. Datastax, Cloudera). Thus, understanding Solr will help you not only in working with the Apache version but as well have a starting point in several platforms that use Solr as base for their search functionalities.

Date: 30 June, 2018, 9:30-13:30
TrainerRadu Gheorghe

Check out the agenda  here.

Big Data Architecture intro

This workshop is addressed to anyone interested in Big Data and the overall architectural components required to build a data solution. We will use Apache Zeppelin for some data exploration but otherwise the workshop will be more a theoretical one – allowing enough time to overall understand which are the possible components and their role in a Big Data Architecture. We will not go in depth in the components/solutions but the aim is to understand the overall role of possible components in architecting a big data solution.

The scope of this workshop is to make the participants familiar with the Big Data architecture components and has as prerequisite the overall understanding of IT architectures.

Date: February 24th, 2018, 9:00 – 13:00
TrainersFelix CrisanValentina Crisan

Check out the agenda  here.

SQL on Hadoop hands on session, part 1: Hive and Impala intro

They say SQL is the english language of the Big Data World, since almost everybody understands/knows its syntax. The aim of this workshop is to explain in SQL what kind of queries can be run on HDFS (the storage component of the Hadoop environment) – for batch and interactive queries – and, out of the several solutions available, to address and run hands on exercises on Apache Hive and Apache Impala and discuss the general performance that can be obtained. We will also discuss different file formats that can be used in order to get the best performance out of Hive and Impala, besides the types of operations/analytics that can be performed on HDFS data.

Date: November 11, 2017, 9:30 – 13:30
TrainersFelix CrisanValentina Crisan 

Check out the agenda  here.

Modeling your data for analytics with Apache Cassandra and Spark SQL

This session is intended for those looking to understand better how to model data for queries in Apache Cassandra and Apache Cassandra + Spark SQL. The session will help you understand the concept of secondary indexes and materialized views in Cassandra and the way Spark SQL can be used in conjunction with Cassandra in order to be able to run complex analytical queries. We assume you are familiar with Cassandra & Spark SQL (but it’s not mandatory since we will explain the basic concepts behind data modeling in Cassandra and Spark SQL). The whole workshop will be run in Cassandra Query Language and SQL and we will use Zeppelin as the interface towards Cassandra + Spark SQL.

Date: 19 August 2017, 9:00 – 13:30
TrainersFelix CrisanValentina Crisan 

Check out the agenda  here.

Data analytics with BigQuery

BigQuery is generally seen as a “fast and fully-managed enterprise data warehouse for large-scale data analytics”. The workshop is designed to go through all the concepts of Big Query and to provide a seamless start into using BigQuery. After this workshop you will be able to start a real project with BigQuery.
Prerequisites: Some SQL knowledge, a Google Account (an email address in gmail.com or in GSuite [formerly called Google Apps]).
When: 13 Mai 2017, 9:30 – 14:00
Trainer: Gabriel Preda 

Check out the agenda  here.

Analytics with Cassandra and Spark SQL Workshop 2

We continue the series of Spark SQL and Cassandra with more hands on exercises on the integration between the 2 solutions, working on open Movielens data. This workshop addresses those who know the basics of Cassandra & CQL and have SQL knowledge. Spark is not mandatory, although would be good to know it’s basic concepts ( RDD, transformations, actions) since we will not address these concepts in the workshop but we will mention them in several occasions. Without Spark basic concepts you will still understand the aggregations that can be done at Spark SQL level but you will not fully understand how Spark SQL integrates in the whole Spark system.
 In this workshop you will understand the optimal way of making queries in a solution composed of Apache Cassandra and Apache Spark.
Prerequisites: Cassandra Concepts knowledge, SQL knowledge

Trainers: Felix Crisan, Valentina Crisan
When: 22 April 2017

Check out the agenda  here.

Analytics with Cassandra and Spark SQL Workshop 1

For those that learned about Apache Cassandra, you have realized so far that Cassandra it’s a storage and pre-aggregation layer, thus a computational layer should exist in order to complete the queries we would like to run on our data. In this workshop we will look at the analytics that can be done on top of Cassandra with Spark SQL, we will start with similar examples in CQL and Spark SQL and we will evolve into examples that can only be run with Spark SQL.

Trainers: Felix Crisan, Valentina Crisan
When: 18 March 2017
Time: 9:30-14:30

Check out the agenda here.

SQL & noSQL: Intro in Cassandra

In this 4 hours session we will learn about Cassandra concepts and data model and what analytics can be done with it. We will discuss about several noSQL solutions out there, how’s Cassandra differentiated from those and we will work on real data importing data in Cassandra, learning CQL and data modeling rules. We will use docker for local installations of Cassandra.

Trainers: Felix Crisan, Valentina Crisan
When: 30 July 2016
Time: 9:30-13:30

Check out the agenda here.

Using Elasticsearch for Logs

This workshop gives an overview of what Elasticsearch can do and how you would use it for searching and analyzing logs and other time-series data (metrics, social media, etc).

Trainer: Radu Gheorghe
When: 25 June 2016

Check out the agenda here.

Spark and Machine Learning workshops

1. Getting started with Spark: intro and hands on session

Spark is the new trend in big data technologies, offering us an easy API and multiple environments to work with, like Batch, SQL, Graph, Machine Learning and Streaming processing. The workshop will start with an introduction in Spark and will continue with many Spark examples, including the well known Wordcount example.

Trainer:Tudor Lapusan
When: 12 March 2017

Check out the agenda here.

2. ML & Spark: MLlib intro and exercises, in the limit of 15 places

Description: Theoretical understanding of various ML algorithms: RandomForest, Clustering.

What you will learn: How to solve ML problems using Spark and MLLib (ml library on top of spark).

Trainer: Alex Sisu
When: 12 March 2016

Check out the agenda here.

Hadoop MapReduce and Spark training

With the occasion of completing the Big Data Romanian Tour in Bucharest we planned a light intro and hands-on session for MapReduce and Spark. Together with the trainers – Tudor Lapusan and Andrei Avramescu – we are planning a dynamic and interactive session that is going to go from theory to practice.

More details here.