SQL on Hadoop hands on session, part 1: Hive and Impala intro

Date: November 11, 2017, 9:30 – 13:30
Trainers: Felix Crisan, Valentina Crisan
Location: eSolutions Academy, Budişteanu Office Building, strada General Constantin Budişteanu Nr. 28C, etaj 1, Sector 1, Bucureşti.
Number of places: 15
Price: 150 RON (including VAT)


They say SQL is the english language of the Big Data World, since almost everybody understands/knows its syntax. The aim of this workshop is to explain in SQL what kind of queries can be run on HDFS (the storage component of the Hadoop environment) – for batch and interactive queries – and, out of the several solutions available, to address and run hands on exercises on Apache Hive and Apache Impala and discuss the general performance that can be obtained. We will also discuss different file formats that can be used in order to get the best performance out of Hive and Impala, besides the types of operations/analytics that can be performed on HDFS data.

This workshop addressed anyone with some SQL knowledge that would like to understand the use cases of Hadoop HDFS and the options available to query the data.

  1. Hadoop environment overview: focus on HDFS 
  2. File formats supported
  3. SQL on Hadoop options: overview of all solutions
  4. Hive and Impala short overview (architecture, use cases, main differences)  
  5. Hands on session (we will use Cloudera Hadoop and Hue for running queries in Hive and Impala)
    1. Input data in HDFS
    2. Define tables metadata in Hive
    3. Use Hive as ETL for file transformation from CSV to Avro, Parquet
    4. Run analytical queries in Hive with different file formats
    5. Run queries in Impala with different file formats      
The price for the workshop is 150 RON (including VAT).
