Kafka Fundamentals
An intro training on Apache Kafka, the open-source distributed event streaming platform. We’ll look at the architectural features of Kafka that enable high-performance data delivery.
24 hours
Online
English
EAS-026
Kafka Fundamentals
Sign Up
Duration
24 hours
Location
Online
Language
English
Code
EAS-026
Schedule and prices
€ 410
Training for 7-8 or more people? Customize trainings for your specific needs
Kafka Fundamentals
Sign Up
Duration
24 hours
Location
Online
Language
English
Code
EAS-026
Schedule and prices
€ 410
Training for 7-8 or more people? Customize trainings for your specific needs

Description

This training will help you get a proper understanding of the architecture and functioning of Apache Kafka, an open-source distributed event streaming platform. We will implement Java-based and REST-based clients for Kafka cluster access, discuss cluster and client configuration to achieve tradeoffs between latency, throughput, durability, and availability. We’ll also consider a multi-cluster setting as it is vital to achieve fault-tolerance and promote scalability.

Kafka Connect allows us to resolve common tasks such as moving data between Kafka and external systems (DBMS, file system, etc.). Using Kafka Streams is the recommended way to build fast and resilient streaming processing solutions.
After completing the course, a certificate
is issued on the Luxoft Training form

Objectives

  • Understand Kafka architecture
  • Understand the deployment and configuration of Kafka
  • Use REST-based access to Kafka
  • Create Kafka Java API clients
  • Design multi-cluster architectures
  • Use Kafka Connect tools
  • Create Kafka Streams programs

Target Audience

  • Software Developers
  • Software Architects
  • Data Engineers

Prerequisites

  • Development experience in Java (over 6 months)

Roadmap

1. Module 1: Kafka Architecture
Planning your own distributed queue in pairs: write, read, keep data in parallel mode.
1. What's the format and average size of messages?
2. Can messages be repeatedly consumed?
3. Are messages consumed in the same order they were produced?
4. Does data need to be persisted?
5. What is data retention?
6. How many producers and consumers are we going to support?

2. Module 2: Kafka-topics, console-consumer, console-producer
1. Using internal Kafka-topics, console-consumer, console-producer
2. Create topic with 3 partitions & RF = 2
3. Send message, check the ISR
4. Organize message writing/reading with order message keeping
5. Organize message writing/reading without order message keeping and hash partitioning
6. Organize message writing/reading without skew data
7. Read messages from the start, end and offset
8. Read topic with 2 partitions / 2 consumers in one consumer group (and different consumer group)
9. Choose optimal number of consumers for reading topic with 4 partitions
10. Write messages with min latency
11. Write messages with max compression


3. Module 3: Web UI + Java, Scala, Python API + other languages (via Rest)
1. build simple consumer and producer
2. add one more consumer to consumer group
3. write consumer which reads 3 records from 1st partition
4. add writing to another topic
5. add transaction

Module 4: AVRO + Schema Registry
1. Add avro schema
2. compile java class
3. build avro consumer and producer with a specific record
4. add schema registry
5. add error topic with error topic and schema registry
6. build avro consumer and producer with a generic record

Module 5: SpringBoot + SpringCloud
Homework:
1. Write template for Spring App
2. Add Kafka Template with producer
3. Add Kafka Template with consumer
4. Add rest controller
5. Modify spring boot to work in async (parallel) mode

Module 6: Streaming Pipelines (Kafka Streams + KSQL + Kafka Connect vs Akka Streams vs Spark Streaming vs Flink)
Homework:
1. Choose the way to read data from a Kafka topic with 50 partitions
2. Try to use the checkpoint mechanism
3. Start the five executors and kill some of them
4. Check the backpressure

Module 7: Kafka Monitoring
Homework:
1. Build several metrics in Grafana
Schedule and prices
View:
Register for the next course
Registering in advance ensures you have priority. We’ll notify you when we schedule the next course on this topic
+
Courses you may be interested in
BigData SQL Hive
This training is aimed at developers and covers the full stack of technical features, architecture and performance tuning. Apache Hive supports analysis of large datasets stored in Hadoop's HDFS and compatible file systems and it provides an SQL-like language with schema on read and transparently converts queries to map/reduce.
BigData SQL Impala
This is a training about Impala for developers covering the full stack of technical features, architecture and performance tuning. Impala supports analysis of large datasets stored in HDFS and compatible file systems, providing an SQL-like language.
Machine Learning in Practice
A basic practical training in machine learning that covers the entire cycle of building a solution – from initial data capture (“.xlsx file”), through building a model, to explaining data and outcomes specifics to the end customer.
View Catalog
Your benefits
Expertise
Our trainers are industry experts, involved in software development project
Live training
Facilitated online so that you can interact with the trainer and other participants
Practice
A focus on helping you practice your new skills
Still have questions?
Connect with us
Thank you.
Your request has been received.
Thank you!
The form has been submitted successfully.