Open course big data
Open Course: Big Data Architecture and Technology Concepts
Course duration: 3.5 days, September 26-29 (Thursday-Saturday 9:00 – 17:00, Sunday 9:30-13:00)
Trainers: Valentina Crisan, Felix Crisan
Location: Bucharest, ImpactHub Universitate, Tudor Arghezi 8-10, 030167
Price: 450 EUR (VAT included), 10% discount early bird if registration is confirmed until 2nd of September – 405 EUR
Number of places: 10
Pre-requisites: knowledge of distributed systems, Hadoop ecosystem (HDFS, MapReduce), know a bit of SQL.
Description:
There are a few concepts and solutions that solutions architects should be aware of when evaluating or building a big data solution: what data partitioning means, how to model your data in order to get the best performance from your distributed system, what is the best format of your data, what is the best storage or the best way to analyze your data. Solutions like HDFS, Hive, Cassandra, Hbase, Spark, Kafka, YARN should be known – not necessarily because you will work specifically with them – but mainly because knowing the concepts of these solutions will help you understand other similar solutions in the big data space. This course is designed to make sure the participants will understand the usage and applicability of big data technologies like HDFS, Spark, Cassandra, Hbase, Kafka ,.. and which aspects to consider when starting to build a Big Data architecture.
This course, although includes a lot of theory, aims to have a balance between the practical and theoretical side. While learning about Cassandra, Spark, Kafka from theoretical perspective we will as well understand the solutions through hands on exercises: we will work with Cassandra Query language and mainly with SQL for Spark SQL and KSQL hands on sessions. For Spark we will use a bit of Scala as well – but the focus will be to understand the behavior of the system, not how to program it.
So, if you are a solutions architect, product manager or just someone who would like to understand how big data architecture could fit in your path/solutions then you should join us in this course. The trainers are experienced in both courses and real life projects and will help you understand the components of big data solutions, the concepts and the way these components could be used in real life examples.
Main topics for the course (the underlined solutions will be detailed):
-
- Big Data Architecture overview: components and role in an architecture – 0.25 day
- Specific technologies overview and details:
- Storage: NoSQL databases (random access on data) – 1.5 days
- Overview of different NoSQL solutions
- Cassandra basics, data modeling and hands on
- Cassandra vs Hbase – comparison
- How to choose a noSQL solution – the considerations
- Storage: NoSQL databases (random access on data) – 1.5 days
- Recap storage options: long term storage of immutable data (HDFS) & random writes/reads of data (Hbase/Cassandra/..).
- Distributed data processing frameworks options: – 0.75 days
-
-
-
- Overview of different solutions : Storm, Flink, Spark, …
- Distributed computations and Stream processing with Spark
- Spark as ETL tool – examples and demo
- Examples on using Spark and Spark streaming + demo session
- Data Analysis with Spark SQL
- SQL on everything options:
- Hive, Impala, Spark SQL, Apache Drill
-
-
- Distributed messaging bus: Apache Kafka – 0.75 days
-
-
-
- Kafka as a messaging bus
- Kafka for streams processing
-
-
- Resource Management – 0.25 days
-
-
- YARN, Mesos, …
-
If you are interested in participating please complete form below. Please note the course needs a minimum of 6 participants, otherwise the course will be rescheduled for a later date.