Duration: 24 hours
DescriptionThis introductory course is just the beginning of a long and exciting way of studying machine learning. Instead of the classical academic approach (“mathematics – ML theory – examples – practice”), this training is primarily practice-oriented. Theory and mathematical background (how algorithms work and why they do work) are also very important. But they should be studied on a more reasonable basis at subsequent stages where the issues of improvements and optimization will arise.
The first example will start from a real-life task: there is a set of tables with initial data (in Excel), but it’s unclear what can be done with them and how one can apply “magic” Big Data to that. By using various machine learning techniques, you’ll learn how to find insights in this data and represent the outcomes in forms understandable by business customers (graphs, new simpler tables, etc.).
After that we’ll review the basic classes of tasks where machine learning is effective, and show examples of solving such tasks. But just using ready-made formulas is certainly ineffective – we’ll also pay attention to the interpretation of results and (to a large extent) to presentation of results to end users of data.
Part of the training will be devoted to discussing practical tasks that trainees deal with. We’ll try to formalize them and even predict possible difficulties in their implementation.
- Which tasks are better solved by machine learning and which are being solved. What will happen if instead of a Data Scientist you hire a non-specialist in a given domain (just a developer/analyst/manager), expecting that they will learn everything in the process.
Preparing, cleaning, and exploring data
- How to gain insight into initial business data (and find whatever order in it at all). Processing sequence. What can and should be done by domain analysts, and what should better be done by a Data Scientist. Priorities in solving a specific task.
Classifiers and Regressors
- Practice – well formalized tasks with prepared data. Differences between tasks (binary/nonbinary/probabilistic classification, regression), redistribution of tasks across classes. Examples of practical tasks classification.
- Where and how to do clustering: exploring data, task setting check, and validation of results. Which cases can be reduced to clustering.
What is good
- How to evaluate results in a way that customers would do. Explaining unusual evaluations and reducing them to usual ones. Particular meaningless questions and what to reply to them. Cross validation and how it shouldn’t be done. Amazing examples of overfitting and how it gets into an even slightly inaccurate architecture.
How to improve a model
- What makes one model better than another: parameters, traits, and ensembles. Something about parameters. Traits in detail, with practice of building and competitions. How not to overdo with traits. A look into the abyss of tools for searching best parameters/traits/methods.
Graphs, reports, dealing with real-life tasks
- How to explain what’s happening in a plain language: to yourself, your team and clients. Better answers to meaningless questions. How to present three terabytes of results in one slide. Semi-automated tests, what process control points are really necessary. From real-life tasks to complete R&D process (“R&D in practice”) – reviewing and analyzing tasks from the audience.
- Understand what tasks can be solved with the help of machine learning (and find out that Big Data is just a subsection, not a mandatory requirement)
- Learn how to utilize initial methods of machine learning, and by using fast prototyping tools learn how to answer the question “Can you evaluate an actual income from possible implementation?”
- Highlight data that should be collected and what can be required from it in near future. Why “we want to store petabytes” – it’s not always just a whim
- Get prepared for more complex subjects, particularly to complete solutions of real complex business problems
- See how exactly machine learning fits with classical analytics. In particular, make sure that it’s unnecessary (or even harmful) to dismiss all existing analysts for concept implementation
- Project Managers who deal with data
- Technical Leads / Senior Developers in any data related projects
- Business Analysts
- Data Engineers
- Architects, System Designers
- Ability to read simple code in Python and to write in any script language.