SQL on Hadoop hands on session, part 1: Hive and Impala intro

They say SQL is the english language of the Big Data World, since almost everybody understands/knows its syntax. The aim of this workshop is to explain in SQL what kind of queries can be run on HDFS (the storage component of the Hadoop environment) – for batch and interactive queries – and, out of the several solutions available, to address and run hands on exercises on Apache Hive and Apache Impala and discuss the general performance that can be obtained. We will also discuss different file formats that can be used in order to get the best performance out of Hive and Impala, besides the types of operations/analytics that can be performed on HDFS data.

Date: November 11, 2017, 9:30 – 13:30
Trainers: Felix Crisan, Valentina Crisan
Location: eSolutions Academy, Budişteanu Office Building, strada General Constantin Budişteanu Nr. 28C, etaj 1, Sector 1, Bucureşti.
Number of places: 15  3 places left
Price: 150 RON (including VAT)

You can check out the agenda and register for this session here 

Modeling your data for analytics with Apache Cassandra and Spark SQL

This session is intended for those looking to understand better how to model data for queries in Apache Cassandra and Apache Cassandra + Spark SQL. The session will help you understand the concept of secondary indexes and materialized views in Cassandra and the way Spark SQL can be used in conjunction with Cassandra in order to be able to run complex analytical queries. We assume you are familiar with Cassandra & Spark SQL (but it’s not mandatory since we will explain the basic concepts behind data modeling in Cassandra and Spark SQL). The whole workshop will be run in Cassandra Query Language and SQL and we will use Zeppelin as the interface towards Cassandra + Spark SQL.

Date: 19 August, 9:00 – 13:30
Trainers: Felix Crisan, Valentina Crisan
Location: eSolutions Academy, Budişteanu Office Building, strada General Constantin Budişteanu Nr. 28C, etaj 1, Sector 1, Bucureşti.
Number of places:  15  8 left
Price: 150 RON (including VAT)

Check out the agenda and register for future session here.

Modeling your data for analytics with Apache Cassandra and Spark SQL

This session is intended for those looking to understand better how to model data for queries in Apache Cassandra and Apache Cassandra + Spark SQL. The session will help you understand the concept of secondary indexes and materialized views in Cassandra and the way Spark SQL can be used in conjunction with Cassandra in order to be able to run complex analytical queries. We assume you are familiar with Cassandra & Spark SQL (but it’s not mandatory since we will explain the basic concepts behind data modeling in Cassandra and Spark SQL). The whole workshop will be run in Cassandra Query Language and SQL and we will use Zeppelin as the interface towards Cassandra + Spark SQL.

Date: 10 June, 9:00 – 13:30 – this workshop will be rescheduled
Trainers: Felix Crisan, Valentina Crisan
Location: eSolutions Academy, Budişteanu Office Building, strada General Constantin Budişteanu Nr. 28C, etaj 1, Sector 1, Bucureşti.
Number of places:  15
Price: 150 RON (including VAT)

Check out the agenda and register for future session here.

Let’s talk BigQuery

Not all big data projects need a complex architecture and engineering team in order to start making sense of the data, so what should you do if you need to do some good old analysis and just want to get started right away? Assuming, for example, that you’re part of a small company, starting up a project and you need to analyze lots of data without spending additional time thinking of/planning the build of an architecture, hiring an architect / engineer, managing an infrastructure…, just need to see through your data and make sense of it. This is where Google’s BigQuery comes into play (of course there are many other potential uses but let’s stick with this for the moment). Called (a bit pretentiously maybe) an Enterprise Cloud Data Warehouse solution, thus scaring upfront many potential users in my opinion, in fact BigQuery is helping many to, at least, quick start their path in the Big Data world.

As part of the preparation of our next workshop, Data Analytics with BigQuery, we interviewed Gabriel Preda – trainer for the workshop but most importantly enthusiastic user of the solution for the last couple of years – to give us a glimpse of what we should expect from this solution.   

Why BigQuery, why did it made sense to you?

Usually in a startup each person wears more than one hat. You put the hat of the sysadmin…. you’re the sysadmin. Later you might need to wear the hat which says „innovation”… and start collecting GBs of daily data and of course process them in a timely fashion. Being short on people it was clear that we needed a SaaS solution.

In which use cases should we use BigQuery (analytical, data migration, cloud requirements)?

BigQuery is designed for OLAP (Online Analytical Processing) or BI. You should not use BigQuery for OLTP. Best use case for BigQuery are: ad hoc and trial-and- error interactive query of large dataset for quick analysis and troubleshooting.

Can you list the best fit scenarios for it?

I have used it successfully for in house analytics solutions. But I think it’s one of the best candidates on the market for data fishing because of it’s ability to perform ad hoc queries on large amount of data…

Is it more feasible to be used in projects where the data has been already natively stored in the cloud (e.g. Google Cloud Storage)?

Data transfer towards BigQuery is free. You might have some costs in transforming the data as there are some requirements on the data BigQuery can ingest. If you already have data in CSV, Avro (and soon Parquet) you can import them directly.

Which are the BigQuery alternatives/competitors?

I don’t know what to say about this… as it is quite a unique beast product!

Can you control where your data is, in case you have some requirements regarding location of your data?

You can choose between US and EU. But that is where it ends. Though there are some awesome news… there is an experimental extension to the BigQuery client that offers client-side encryption (Homomorphic encryption) for a subset of query types… that is: you can encrypt your data, upload encrypted data to BigQuery, run queries, fetch the results and decrypt them locally. It’s magic!

How you visualize the results of the analysis or the correlations of the data in BigQuery.

In the worst case scenario, when you can’t use the existing integrations, you can retrieve the results and use any visualization tool you are accustomed with. Now there are a lot of available integrations like: Tableau, Qlik, Talend, Informatica, SnapLogic or newcomers like Chartio or even free & open source BI tools like Metabase. There is also a Google solution (for now in beta) called Data Studio which covers more than BigQuery. I’ll do my best add details about Data Studio during the workshop.

Interview by: Valentina Crisan – bigdata.ro

Data Analytics with BigQuery

BigQuery is generally seen as a “fast and fully-managed enterprise data warehouse for large-scale data analytics”. The workshop is designed to go through all the concepts of Big Query and to provide a seamless start into using BigQuery. After this workshop you will be able to start a real project with BigQuery.

Date: 13 Mai, 9:30 – 14:00
Trainer: Gabriel Preda 
Location: eSolutions Academy, Budişteanu Office Building, strada General Constantin Budişteanu Nr. 28C, etaj 1, Sector 1, Bucureşti.
Number of places: 15, no more places left
Price: 150 RON  including  VAT

Check out the agenda and register here

Analytics with Cassandra and Spark SQL Workshop

We continue the series of Spark SQL and Cassandra with more hands on exercises on the integration between the 2 solutions, working on open Movielens data. This workshop addresses those who know the basics of Cassandra & CQL and have SQL knowledge. Spark is not mandatory, although would be good to know it’s basic concepts ( RDD, transformations, actions) since we will not address these concepts in the workshop but we will mention them in several occasions. Without Spark basic concepts you will still understand the aggregations that can be done at Spark SQL level but you will not fully understand how Spark SQL integrates in the whole Spark system.
 In this workshop you will understand the optimal way of making queries in a solution composed of Apache Cassandra and Apache Spark.
Prerequisites: Cassandra Concepts knowledge, SQL knowledge

Trainers: Felix Crisan, Valentina Crisan
When: 22 April
Time: 9:30-14:00
Location: eSolutions Academy, Budişteanu Office Building, strada General Constantin Budişteanu Nr. 28C, etaj 1, Sector 1, Bucureşti.
Number of places: 15 5 places left
Price: 125 RON  including  VAT

Check out the agenda and register here.

Analytics with Cassandra and Spark SQL Workshop

For those that learned about Apache Cassandra, you have realized so far that Cassandra it’s a storage and pre-aggregation layer, thus a computational layer should exist in order to complete the queries we would like to run on our data. In this workshop we will look at the analytics that can be done on top of Cassandra with Spark SQL, we will start with similar examples in CQL and Spark SQL and we will evolve into examples that can only be run with Spark SQL.

Trainers: Felix Crisan, Valentina Crisan
When: 18 March
Time: 9:30-14:30
Location: eSolutions Academy, Budişteanu Office Building, strada General Constantin Budişteanu Nr. 28C, etaj 1, Sector 1, Bucureşti.
Number of places: 15   no more places left
Price: 125 RON  including  VAT

There are no more places left for this session, but you can check out the agenda and register here and we’ll keep you informed if places become available.

SQL & noSQL: Intro in Cassandra

In this 4 hours session we will learn about Cassandra concepts and data model and what analytics can be done with it. We will discuss about several noSQL solutions out there, how’s Cassandra differentiated from those and we will work on real data importing data in Cassandra, learning CQL and data modeling rules. We will use docker for local installations of Cassandra.

Trainers: Felix Crisan, Valentina Crisan
When: 30 July
Time: 9:30-13:30
Location: Impact Hub ( http://www.impacthub.ro/ )
Number of places: 15 2
Price: 100 RON without VAT

Check out the agenda and register here

Using Elasticsearch for Logs – hands on session by Radu Gheorghe

This workshop gives an overview of what Elasticsearch can do and how you would use it for searching and analyzing logs and other time-series data (metrics, social media, etc).

Trainer: Radu Gheorghe
When: 25 June
Location: Impact Hub ( http://www.impacthub.ro/ )
Time: 9:30 – 13:30
Price: 100 RON without VAT
Number of places: 15 NO MORE SEATS LEFT.

There are no more seats left for this session. If you want to sign-up for a future session check out the next link: Sign up for Future Elasticsearch hands on session

 

Spark Intro and Machine Learning workshops

Spark and Machine Learning workshops day on March 12, at TechHub:

1. 9:00 – 12:30 – Getting started with Spark: intro and hands on session (20 places)

2. 13:00 – 16:30 ML & Spark: MLlib intro and exercises (15 places)

Registration should be made separately for each workshop.

1. Getting started with Spark: intro and hands on session, in the limit of 20 places

Spark is the new trend in big data technologies, offering us an easy API and multiple environments to work with, like Batch, SQL, Graph, Machine Learning and Streaming processing.

The workshop will start with an introduction in Spark and will continue with many Spark examples, including the well known Wordcount example.

You have the option to choose the programming language you are most familiar with, so the examples will be written and explained in Java, Scala and Python (to be confirmed the Java one).

We will try all Spark examples in local mode, so all you need is your own laptop. The major benefit of this is that you can continue learning and try new examples even after the workshop.

Trainer :Tudor Lapusan

Agenda and Sign up for Intro to Spark

2. ML & Spark: MLlib intro and exercises, in the limit of 15 places

Description: Theoretical understanding of various ML algorithms: RandomForest, Clustering.

What you will learn: How to solve ML problems using Spark and MLLib (ml library on top of spark).

TRAINER: ALEX SISU

Agenda and Sign up for ML & Spark