Data Analysis for Plant and Animal Breeding


About this course

Data analysis is central to both plant and animal breeding, and the size and complexity of phenotypic and genomic data sets continue to increase. Thus, the ability to analyze and interpret such large data sets is an essential skill for breeders, both in science and industry. In this course you will become familiar with state-of-the-art methods and skills for quantitative genetic analysis of breeding data, both for animals and plants.

This is a hands-on course, where you develop the skills to analyze real-life data and handle real-life problems in genetic analysis. Next to genetic analysis, this will include developing the skills to competently curate data sets in the R software environment. At the same time, you will develop an understanding of the statistical methods on an applied, practically relevant and intuitive level. This includes being able to choose an appropriate analysis based on the research question and the data at hand, understanding the statistical model and its assumptions, interpreting the results and becoming aware of common pitfalls. You will achieve this by working on illustrative real-life data sets that link to modern animal and plant breeding. The course covers the most important categories of statistical models and the associated methods for genetic and genomic data analysis and for model validation. During the course, you will gradually build up the required R-skills.

We make use of plenary lectures and computer tutorials focused on application using real-life data, and you will also work on two case studies. In each of the two case studies, you will analyze an actual data set and write a short report on the analysis. In the tutorials, you will learn how to use the R-software for data handling, editing, filtering and quantitative genetic analyses. You will also become familiar with more advanced methods for genetic analysis, with complex pedigreed and large genomic data, using dedicated software.

The course consists of six one-week modules. In the first week, you will become familiar with data handling, visualization and editing, and model building and model validation using linear models. In the next weeks you will become familiar with more advanced statistical models and tools, with a major focus on Linear Mixed Models, and also including Generalized Linear Models and Maximum Likelihood, and the use of these tools for quantitative genetic analysis of breeding data. In the final weeks, you will become familiar with more advanced analysis of genetic, genomic and phenotypic (big) data in animals and plants. This includes the estimation of genetic parameters such as heritability, QTL mapping, genomic prediction, and genome-wide association studies.

Note: This course cannot be combined in an individual program with PBR-34803 Experimental Design and Data Analysis of Breeding Trials and/or PBR-32803 Markers in Genetics and Plant Breeding.

Learning outcomes

After successful completion of this course students are expected to be able to:

  • Apply data handling skills necessary to competently curate data sets in the R software environment
  • Choose a model category, build a model for quantitative genetic analysis of a given data set and research question, and execute the analysis
  • Interpret and explain the results of your data analysis
  • Perform model validation by evaluating model assumptions and/or cross validation (for genomic prediction), using illustrative plots
  • Explain the differences between a linear model (LM), linear mixed model (LMM) and a generalized linear model (GLM) in terms of model assumptions and purpose of the analysis
  • Explain the principles of maximum likelihood and restricted maximum likelihood
  • Explain the difference between fixed and random effects
  • Explain how the heritability of a trait can be estimated in pedigreed or genotyped populations in animals or plants
  • Design an experiment for estimating heritabilities, for QTL mapping and for genome-wide association studies (GWAS)
  • Explain how genome-wide association studies or QTL-detection can be used to detect genomic regions of interest in outbred populations or in line crosses
  • Explain how genomic prediction can be performed, and propose statistical models for that purpose

Required prior knowledge

Assumed Knowledge:
It is assumed that students who take this course have a basic understanding of statistics and some understanding of genetics. It is recommended to take the courses MAT15303 + MAT15403 and MAT20306 Advanced Statistics before taking part in the present course. Some experience with the R-software is helpful, but not mandatory.

Link to more information

If anything remains unclear, please check the FAQ of Wageningen University.


  • Start date

    10 March 2025

    • Ends
      2 May 2025
    • Term *
      Period 5
    • Location
    • Instruction language
    • Register between
      1 Jun, 00:00 - 9 Feb 2025
These offerings are valid for students of Utrecht University