Description
This training focuses on the key concepts and methods for data processing applications development using Apache Spark. We’ll look at the RDD-based framework for automated distributed code execution, and companion projects in different paradigms: Spark SQL, Spark Streaming, MLLib, Spark ML, GraphX.
Roadmap
- Spark concepts and architecture
-
Programming with RDDs: transformations and actions
-
Using key/value pairs
-
Loading and storing data
-
Accumulators and broadcast variables
-
Spark SQL, DataFrames, Datasets
-
Spark Streaming
-
Machine Learning using MLLib and Spark ML
-
Graph analysis using GraphX