Modern Data Management Approaches in Real World Cases
Description
This course provides an overview of modern data architecture. We will learn real world high load architecture of the Nvidia company with such storages like relational data base, message queues, data storage, key-value stores and mpp distributed data storage. Also using of Kafka, Cassandra, MongoDB in modern solutions.
is issued on the Luxoft Training form
Objectives
Upon completion of the course, students will be able to:
-
Build a real-world architecture with regard to the issue of collecting statistics of more than 20M gaming cards;
-
Understand Read / Write paths, Physical Stores, Data Formats, Amount of Data, Pros & Cons of such storages like Relation Model, Document Oriented, Message Queue, Key Value, MPP, In Memory, etc.;
-
Understand what data and request characteristics have to be considered at the stage of requirements analysis and selection of data management systems;
-
Know the possibilities and limitations of modern relational and non-relational data management systems;
-
Analyze requirements while selecting database management systems.
Target Audience
Roadmap
-
Real-world architecture with regard to the issue of collecting statistics of more than 24M gaming cards. Estimates. [theory: 1 hour Practice 1 Hour]
-
The evolution of approaches to data storage: databases, data storages, database machines, mass-parallel architectures, hyperconvergence [theory: 0.5 hour]
-
Relational model: which problems can be solved at the expense of what replication, sharding, distributed transactions [theory: : 0.5 hour]
-
Document-oriented model. [MongoDB] [theory: 2.5 hour; practice: 1.5 hour]
-
Message queues and streaming platforms. Data stream processing. [Spark Streaming] [theory: 2 hours practice: 2 hours]
-
“Key-value” minimal model: key structure options, value structure options, program interfaces. Efficiency of non-relational databases: necessary and sufficient conditions [Cassandra, HBase] [theory: 2 hours practice: 2 hours]
-
Distributed file systems: cluster architecture [HDFS]. SQL over distributed file systems: possible architectures, limitations, transactions. [Hive, Spark, Spark SQL, Parquet, ORC] [theory: 2 hours practice: 2 hours]
-
Distributed in-memory data storage systems. [Hazelcast, Ignite, Tarantool] [theory: 0.5 hour]
-
Distributed OLAP systems. [Druid] [theory: 0.5 hour]