Data Driven Discovery in the Life Sciences: Hypothesis Generation from Omics Data


About this course

Across the life sciences, scientists utilize omics data to study biological phenomena in humans, plants, animals and microbes. This results in large and heterogeneous data sets that can be analyzed using a variety of algorithms and statistical methods. Making sense of the data, extracting biological knowledge out of the results of these analyses and formulating new hypotheses and research questions based on them is not trivial. However, when basic data science skills are combined with domain knowledge, either from literature or from databases accessed in a high-throughput manner, omics data constitute a goldmine for data-driven discovery of novel insights and hypotheses that can be tested in follow-up experiments.
This course will train students in linking domain knowledge to data using data science techniques and skills, in order to design omics experiments, evaluate the quality of the resulting data, interpret them in the light of literature and domain databases, and mine them to make discoveries and compose new research questions and hypotheses. Domain-specific case studies will allow students to directly apply their skills on data relevant to their specialization.

Learning outcomes

After successful completion of this course students are expected to be able to:

  • describe the advantages and limitations of different types of omics data;
  • access, in a high throughput manner, databases commonly used in the life sciences for the interpretation of omics data;
  • design effective omics experiments, with appropriate replicates, controls, controlling for batch effects, etc.;
  • evaluate the quality and limitations of omics data (quality of the raw data, technical/biological variation, etc.) based on the outcomes of statistical analyses;
  • interpret (processed) omics analysis results using domain knowledge, data mining (commonly used biological databases ) and literature mining;
  • extract knowledge from the data and synthesize this into a (possible) biological story and compose new research questions and hypotheses based on this (data-driven).

Required prior knowledge

Assumed Knowledge:
Statistics and data analysis as applied to omics data, as treated in BIF-51306 Data Analysis and Visualization or SSB-30306 Molecular Systems Biology and in MAT-32806 Statistics for Data Science. Basic coding skills in R and/or Python are considered very useful. The course is considered most useful for life science students who have recently acquired basic data science skills (e.g., through the data science track) and want to learn to apply these, or for bioinformatics students with a limited background in the life sciences who want to improve their data interpretation skills.

Link to more information

If anything remains unclear, please check the FAQ of Wageningen University.


  • Start date

    12 May 2025

    • Ends
      4 July 2025
    • Term *
      Period 6
    • Location
    • Instruction language
For guests registration, this course is handled by Wageningen University