## About this course

Big data are becoming available in nearly all scientific fields. Thus, the ability to analyse and interpret such data is becoming an essential skill both in science and industry. In this course you will become familiar with state-of-the-art statistical methods to analyse such data.

The course covers the most important categories of statistical models and the associated methods for statistical analysis. The applications of these methods will be illustrated by hands-on examples. Several examples will relate to the analysis of phenotypic and genetic/genomic data in animals and plants. We make use of plenary lectures, computer tutorials focused on application of the statistical methods, and two cases. In each of the two cases, you will analyse an actual data set and write a small report on the analysis. In the tutorials, you will learn how to use the R-software for statistical analysis.

The course consists of six one-week modules. In the first four modules, you will become familiar with the most important statistical models and methods. This includes Linear Models, Linear Mixed Models, Generalized Linear Models, Maximum Likelihood and Bayesian statistics.

In the last two modules, you will become familiar with the application of these models for the analysis of genetic/genomic and phenotypic (big) data in animals and plants. This includes the estimation of genetic parameters such as heritability, QTL mapping, genomic prediction & genomic selection, and genome-wide association studies.

Note: This course can not be combined in an individual program with PBR-34803 Experimental Design and Data Analysis of Breeding Trials and/or PBR-32803 Markers in Genetics and Plant Breeding.

## Learning outcomes

After successful completion of this course students are expected to be able to:

- explain the differences between a linear model (LM), linear mixed model (LMM) and generalized linear model (GLM) in terms of model assumptions;
- choose an appropriate statistical model (i.e., LM, LMM or GLM) when given a data set and research question;
- execute an analysis in R with a standard LM (ANOVA or regression), LMM (split-plot) or GLM (logistic regression or log linear model) for a given data set, and interpret the results of such an analysis;
- explain the principle of maximum likelihood estimation;
- describe the differences between Bayesian and classical statistics;
- explain how the heritability of a trait can be estimated in pedigreed or genotyped populations in animals or plants;
- explain how Quantitative Trait Loci (QTL) can be detected in plants, using an outcross population or a cross between inbred lines;
- design an experiment for estimating heritabilities, for QTL mapping and for genome-wide association studies (GWAS);
- explain how genomic prediction can be done, and propose statistical models for that purpose.

## Required prior knowledge

Assumed Knowledge:

It is assumed that students who take this course have a basic understanding of statistics and some understanding of genetics. It is recommended to take the courses MAT-15303 + MAT-15403 and MAT-20306 Advanced Statistics before taking part in the present course.

## Link to more information

- Credits
**ECTS 6** - Level
**bachelor** - Contact coordinator