Hadoop Fundamentals | | Software Development

Hadoop Fundamentals
This training focuses on the key concepts and methods for data processing applications development using Apache Hadoop.
24 hours
Online
English
EAS-015
Hadoop Fundamentals
Sign Up
Duration
24 hours
Location
Online
Language
English
Code
EAS-015
Schedule and prices
€ 600
Training for 7-8 or more people? Customize trainings for your specific needs
Hadoop Fundamentals
Sign Up
Duration
24 hours
Location
Online
Language
English
Code
EAS-015
Schedule and prices
€ 600
Training for 7-8 or more people? Customize trainings for your specific needs

Description

This training provides a foundation of Apache Hadoop concepts and methods for developing data-processing applications while using it. Participants will get acquainted with HDFS, the de facto standard for long-term reliable big data storage; the YARN framework that manages parallellized execution of applications on a cluster; and the Hadoop ecosystem projects: Hive, Spark, & HBase.
After completing the course, a certificate
is issued on the Luxoft Training form

Objectives

  • Understand the key concepts and architecture of Hadoop
  • Get an idea of the ecosystem that has developed around Hadoop and its key components
  • Know how to read & write data to/from HDFS
  • Comprehend the MapReduce programming paradigm
  • Be able to access tabular data using Hive
  • Learn to access tabular data using Spark SQL/DataFrame in batch mode
  • Process data streams using Spark Structured Streaming
  • Learn to use HBase for low-latency data storage and reading

Target Audience

  • Software developers
  • Software architects
  • Database designers
  • Database administrators

Prerequisites

  • Basic Java programming skills
  • Unix/Linux shell familiarity
  • Experience with databases is optional

Roadmap

1. Basic concepts of modern data architecture: Lambda

2. External storages: Apache Kafka, Amazon S3 and tools for working with.

3. HDFS: Hadoop Distributed File System
- Architecture, replication, data in/out, HDFS commands
Practice (shell, Hue): connecting to a cluster, working with the file system

4. The MapReduce paradigm, engines and its implementation in Frameworks:
Practice: Launching applications

5. YARN: Distributed application execution management
- YARN architecture, application launch in YARN
Practice: launching applications and monitoring the cluster through the UI

6. Introduction to Hive
- Architecture, Table metadata, File formats, HiveQL query language
Practice (Hue, hive, beeline, Tez UI): creating tables, reading & writing CSV, Parquet, ORC, partitioning, SQL queries with aggregation and joins

7. Introduction to Spark
- DataFrame/SQL, metadata, file formats, data sources, RDD
Practice (Zeppelin, Spark UI): reading & writing from the database (JDBC), CSV, Parquet, partitioning, SQL queries with aggregation and joins, query execution plans, monitoring

8. Introduction to streaming data processing
- Spark Streaming, Spark Structured Streaming, Flink
Practice: Reading/processing/writing streams between Kafka, relational database and file system
Schedule and prices
View:
Register for the next course
Registering in advance ensures you have priority. We’ll notify you when we schedule the next course on this topic
+
Courses you may be interested in
Data Warehouse Fundamentals
Understand current approaches to designing data warehouses and using them in heterogeneous enterprise information systems.
Online:
13.03.2023 - 17.03.2023
Modern Data Management Approaches in Real World Cases
This training provides an overview of modern methods for data storage, including key-value stores, document-oriented and database management systems, distributed data storage and processing systems.
BigData SQL Hive
This training is aimed at developers and covers the full stack of technical features, architecture and performance tuning. Apache Hive supports analysis of large datasets stored in Hadoop's HDFS and compatible file systems and it provides an SQL-like language with schema on read and transparently converts queries to map/reduce.
View Catalog
Your benefits
Expertise
Our trainers are industry experts, involved in software development project
Live training
Facilitated online so that you can interact with the trainer and other participants
Practice
A focus on helping you practice your new skills
Still have questions?
Connect with us
Thank you!
The form has been submitted successfully.