Code: EAS-011
Duration: 16 hours
Duration: 16 hours
Description
In application design, one of the most important decisions that needs to be made is with regards to data storage. For decades, relational databases remained the first and only option. Projects differed only in their degree of normalization, location of business logic, etc. In the last ten to fifteen years, a lot of alternative systems have appeared – from object-oriented and document-oriented DBMS to distributed file systems and data flow processing systems.This training reviews a range of modern solutions that allow you to reliably store data for a long time, analyses solutions of different classes, their advantages, and best practices in using them.
Roadmap
- The evolution of approaches to data storage: databases, data storages, database machines, mass-parallel architectures, hyperconvergence
- Relational model: which problems can be solved, at expense of what; replication, sharding, distributed transactions
- “Key-value” minimal model: key structure options, value structure options, program interfaces. Efficiency of non-relational databases: necessary and sufficient conditions (Cassandra, HBase)
- Document-oriented model (MongoDB)
- Distributed file systems: cluster architecture (HDFS)
- SQL over distributed file systems: possible architectures, limitations, transactions. (Hive, Spark, Spark SQL, Parquet, ORC)
- Distributed in-memory data storage systems. (Hazelcast, Ignite, Tarantool)
- Distributed OLAP systems (Clickhouse, Druid)
- Data stream processing. (Spark Streaming)
- Bootstrap and stand-alone databases
Objectives
- Understand what data and request characteristics have to be considered at the stage of requirements analysis and selection of data management systems
- Know the possibilities and limitations of modern relational and non-relational data management systems
Target Audience
- Software architects
- Application developers
- Business analysts
- Database administrators