About this course
Machine learning is a branch of artificial intelligence that enables computers to learn patterns from data. Instead of following fixed rules, machine-learning algorithms are trained on data . It is typically used to make predictions , recognize patterns , or automate decisions in complex tasks such prediction of protein structures, monitoring of crop health, and land-use classification based on sequencing data, photos, and satellite imagery. With the increasing availability of data, machine learning plays a dominant role in many scientific and applied areas.
Machine learning algorithms are commonly divided into supervised and unsupervised approaches. Supervised learning uses labeled data, where the correct output is known, and includes classification (predicting discrete categories, such as disease vs. healthy) and regression (predicting continuous values, such as crop yield). Unsupervised learning works with unlabeled data and focuses on discovering patterns or structure within the data. Typical methods include clustering (grouping similar samples together) and dimensionality reduction (simplifying complex datasets by reducing the number of variables while preserving important information).
The course consists of a generic part, where students learn the theory and application of several different machine-learning algorithms using lectures , pen-and-paper exercises , and programming practicals. The course discusses the theory of different methods for regression, classification, clustering, and dimensionality reduction that can be applied to a wide variety of different problems and application domains. Students will learn how to properly train and evaluate machine-learning models, what typical issues are that can arise, and how to deal with these issues. Furthermore, attention is paid to the ethical, legal, and social aspects of applying machine learning in practical use-cases. This generic part is evaluated with a written exam.
In the project , students need to create their own machine-learning solutions for some specific use-cases. To ensure that the project is of interest, students can choose to join one of three projects in the area of: a) biosystems engineering, b) geo-information science, or c) bio-informatics. The project includes a competitive element to engage the students even more. This specific part is evaluated based on the project report and code.
Note : there is a significant overlap with the course Statistics for Data Scientists (MAT32806) . We recommend students to not take both courses . Machine Learning is recommended if you intend to take the courses Deep Learning (AIN31306) and/or **Advanced Machin...
Learning outcomes
Explain machine-learning problems, algorithms, and their formulas [understand]
Qualitatively and quantitatively compare the characteristics, (dis)advantages, formulas, and performance of a number of key algorithms [understand]
Describe proper use of data and typical pitfalls in using machine learning [understand]
Apply machine-learning techniques in a) biosystems engineering (MBE), b) bioinformatics (MBF), c) geo-information sciences and remote sensing (MGI), or another field of study [apply]
Analyse a use-case to choose the right machine-learning method [analyse]
Critically evaluate the performance of machine-learning algorithms [evaluate]
Design and implement effective solutions based on chosen algorithms, to solve practical problems [create]
Assessment method
- Written test with open and closed questions (50%) This is a closed-book exam with closed and open questions
- Assignment other (50%) Students work in groups of two. The project grade is based on the quality of the submitted report, project code, demonstration and explanation. GenAI is allowed, but students will be tested on full understanding of their code. In case of a failed project, a resubmission can be handed in during the resit periods.
Prior knowledge
- Mathematics (Mathematics 1 (MAT14803) and Mathematics 2 (MAT14903), or equivalent);
- Statistics (Data Analysis Biosystems Engineering (FTE26306), Advanced Statistics (MAT20306), or equivalent);
- Programming in Python (INF22306).
Resources
- James G., Witten D., Hastie T., Tibshirani R., and Taylor, J.: An Introduction to Statistical Learning: with Applications in Python (ISBN: 978-3031387463, freely available online at https://www.statlearning.com)
Additional information
- Levelmaster
- Mode of instructionon campus
