Date: November 11, 2017, 9:30 – 13:30
Trainers: Felix Crisan, Valentina Crisan
Location: eSolutions Academy, Budişteanu Office Building, strada General Constantin Budişteanu Nr. 28C, etaj 1, Sector 1, Bucureşti.
Number of places:
15 3 places left
Price: 150 RON (including VAT)
They say SQL is the english language of the Big Data World, since almost everybody understands/knows its syntax. The aim of this workshop is to explain in SQL what kind of queries can be run on HDFS (the storage component of the Hadoop environment) – for batch and interactive queries – and, out of the several solutions available, to address and run hands on exercises on Apache Hive and Apache Impala and discuss the general performance that can be obtained. We will also discuss different file formats that can be used in order to get the best performance out of Hive and Impala, besides the types of operations/analytics that can be performed on HDFS data.
This workshop will be run completely on a cloud environment ( Cloudera’s Hadoop distribution on Bigstep cloud infrastructure).
The workshop is addressed to anyone with some SQL knowledge that would like to understand the use cases of Hadoop HDFS and the options available to query the data.
- Hadoop environment overview: focus on HDFS
- File formats supported
- SQL on Hadoop options: overview of all solutions
- Hive and Impala short overview (architecture, use cases, main differences)
- Hands on session (we will use Cloudera Hadoop and Hue for running queries in Hive and Impala)
- Input data in HDFS
- Define tables metadata in Hive
- Use Hive as ETL for file transformation from CSV to Avro, Parquet
- Run analytical queries in Hive with different file formats
- Run queries in Impala with different file formats
1. Complete registration form:
2. Payment (150 RON – incl. VAT) – please complete first the above form and then follow next link for completing the payment: http://mpy.ro/3suskaev