Apache Spark Fundamentals

Apache Spark Fundamentals
We’ll look at the RDD-based framework for automated distributed code execution, and companion projects in different paradigms: Spark SQL, Spark Streaming, MLLib, Spark ML, GraphX.
26 hours
Online
English
EAS-017
Apache Spark Fundamentals
Sign Up
Duration
26 hours
Location
Online
Language
English
Code
EAS-017
Schedule and prices
650.00 *
Training for 7-8 or more people? Customize trainings for your specific needs
Apache Spark Fundamentals
Sign Up
Duration
26 hours
Location
Online
Language
English
Code
EAS-017
Schedule and prices
650.00 *
Training for 7-8 or more people? Customize trainings for your specific needs

Description

This training course delivers key concepts and methods for data processing applications development using Apache Spark. We’ll look at the Spark framework for automated distributed code execution, and companion projects in the Map-Reduce paradigm. We’ll work with Spark Data API, in box connectors, batch and streaming pipelines.
After completing the course, a certificate
is issued on the Luxoft Training form

Objectives

During the training participants will:
  1. Write a Spark pipeline via functional Python and RDDs; 
  2. Write a Spark pipeline via Python, Spark DSL, Spark SQL and DataFrame; 
  3. Draw architecture with different sources; 
  4. Write a Spark pipeline with external systems (Kafka, Cassandra, Postgres) which works in parallel modes; 
  5. Resolve problems with slow joins. 
After the training, participants will be able to build a simple PySpark application and execute it on the cluster in parallel mode.

Target Audience

  • Software developers
  • Software architects

Roadmap

  • Spark concepts and architecture

    Map Reduce and Spark in Hadoop. Examples.

    Spark in Lambda architecture.

    Processing distribution cluster management.

    How to start spark?

    Definitions.

  • Programming with RDDs: transformations and actions

    What's the difference between spark session and spark context

    How to create and parallelize RDD

    How to transform RDD

    How to analyze and control RDD processing (plan and DAG)

    How to save and persist RDD to HDSF

    How to group and join RDD

  • Programming with DataFrame

    What's the difference between RDD and DataFrame

    How to create and parallelize DataFrame

    How to analyze and control DataFrame (plan and DAG)

    How to save to HDSF

  • Loading data from/in external storages

    How to read/write data from file storages (HDFS, S3, FTP, FS)

    What data format to choose

    How to parallelize reading/writing to JDBC

    How to create dataframe from MPP (Cassandra, vertica, gp)

    How to work with Kafka

  • Write logic with Spark DSL

    How to count rows

    How to process math aggregations

    How to group rows

    How to correct join DataFrames

  • Write logic with Spark SQL

    How and why to start using Spark SQL

    How to work with EXTERNAL table

    How to work with MANAGED table

  • Using Window and UDF functions

    What is window functions and how to use it in Spark

    When not to use window functions

    What is UDF, UDAF and how to use it

    How to optimize using UDF in PySpark

  • Spark Types

    Boolean: how to add filter

    Numerical: How to calculate sum, multiplying, stat

    String: How to use regexes

    Complex: How to work with structure, array

    How to work with Date

  • Spark optimization cases

    Out of memory

    Small files in HDFS

    Skew Data

    Slow joins

    Broadcast big tables

    Sharing resources

    AQE & DPP

  • Show Entire Program
Schedule and prices
View:
Register for the next course
Registering in advance ensures you have priority. We will notify you when we schedule the next course on this topic
+
Your benefits
Expertise
Our trainers are industry experts, involved in software development project
Live training
Facilitated online so that you can interact with the trainer and other participants
Practice
A focus on helping you practice your new skills
Залишилися запитання?
Зв'яжіться з нами
Thank you!
The form has been submitted successfully.