Databricks Fundamentals | | Software Development and Maintenance

Databricks Fundamentals
Databricks is an increasingly popular platform for big data processing and analysis. Our Databricks Fundamentals course is a great way to start if you want to improve your skills in this area.
28 hours
Online
English
EAS-028
Databricks Fundamentals
Sign Up
Duration
28 hours
Location
Online
Language
English
Code
EAS-028
Schedule and prices
€ 700 *
Training for 7-8 or more people? Customize trainings for your specific needs
Databricks Fundamentals
Sign Up
Duration
28 hours
Location
Online
Language
English
Code
EAS-028
Schedule and prices
€ 700 *
Training for 7-8 or more people? Customize trainings for your specific needs

Description

Databricks is an increasingly popular platform for big data processing and analysis. Our Databricks Fundamentals course is a great way to start if you want to improve your skills in this area. You will acquire practical experience with important Databricks tools and ideas over the course of several modules, including writing queries in Scala, Python, and SQL, using Delta Lake / Parquet, and working with Notebooks.

One of the primary goals of the course is to make you more comfortable when using Notebook, the web-based interface for data analysis and collaboration for Databricks. With guidance from our trainer, you’ll learn how to efficiently build, manage, and share notebooks, allowing you to deal with complex data challenges.

Another important topic we will cover is the open-source engine, Spark, that powers Databricks data processing capabilities. You will gain a deep understanding of Spark’s internal architecture, as here we can mention RDD (Resilient Distributed Datasets) which according to databricks.com “is an immutable distributed collection of elements of your data, partitioned across nodes in your cluster, that can be operated in parallel with high level API that offers transformations and actions."

In order to make the right decisions on the project and avoid architectural errors, you’ll discover the differences between Delta Lake and Parquet, two file types used by Databricks to store data. Understanding the particularities of these formats will help you select the best one for your project, leading to more efficient results. We will also cover one of the key topics for any big data environment, which is query writing. You'll learn how to write queries in Scala and SQL, giving you the flexibility to work with different languages and tools as needed.

You will learn how to optimize your Databricks workflows for maximum performance and also learn how to use powerful visualization tools to gain valuable insights - in order to drive better decisions for the project.

Overall, the Databricks Fundamentals course is a detailed practical introduction to this big data tool. With guidance from our trainer, who is an experienced Data Engineer, you’ll be able to develop the abilities and confidence to successfully handle the most complex data tasks.
After completing the course, a certificate
is issued on the Luxoft Training form

Objectives

  • Practice working with Notebook
  • Understand Spark internal structures
  • Ascertain the differences between Delta Lake vs Parquet
  • Write query in Scala, Python, & SQL
  • Learn about optimization in Databricks
  • Explore Data deeply with Databricks

Target Audience

  • Developers
  • Architects

Roadmap

1. Introduction to Databricks
  • Creating Databricks Service
  • Databricks RI Overview
  • Databricks Architecture Overview
  • Databricks Notebooks

2. Databricks Cluster and Jobs
  • Cluster types and configuration
  • Databricks cluster pool
  • Databricks Job
  • Notebooks’ workflows

3. DBFS

4. Databricks and Spark
  • Data Formats
  • Transformation
  • Joins, Aggregation
  • SQL

5. Delta Lake
  • Pitfalls of Data Lakes
  • Data Lakehouse Architecture
  • Read & Write to Delta Lake
  • Updates and Deletes on Delta Lake
  • Merge/Upsert to Delta Lake
  • History, Time Travel, Vacuum
  • Delta Lake Transaction Log
  • Convert from Parquet to Delta
  • Data Ingestion
  • Data Transformation - PySpark and Notebooks

6. Visualizations in Databricks

7. Collaboration in Databricks

8. Deploying Databricks on Azure

9. Deploying Databricks on the AWS Marketplace

10. Data Protection Use cases
Schedule and prices
View:
Register for the next course
Registering in advance ensures you have priority. We will notify you when we schedule the next course on this topic
+
Courses you may be interested in
Data Warehouse Fundamentals
Understand current approaches to designing data warehouses and using them in heterogeneous enterprise information systems.
Online:
29.09.2023 - 06.10.2023
Modern Data Management Approaches in Real World Cases
This training provides an overview of modern methods for data storage, including key-value stores, document-oriented and database management systems, distributed data storage and processing systems.
View Catalog
Your benefits
Expertise
Our trainers are industry experts, involved in software development project
Live training
Facilitated online so that you can interact with the trainer and other participants
Practice
A focus on helping you practice your new skills
Still have questions?
Connect with us
Thank you!
The form has been submitted successfully.