Computational Biology

SSB20306

About this course

The availability of large amounts of high throughput omics data gives us new insights and a better understanding of the molecular mechanisms of life. This course revolves around two commonly asked questions:

  • how can we transform this data into useful information?
  • what can we learn from this kind of information?

This course will introduce the basic concepts and tools essential for this transformation process. Background information on frequently used computational tools for DNA, RNA, and protein sequence analysis is mixed with practical, hands-on elements consisting of exercises demonstrating important basic bioinformatics concepts.

The course is divided in a number of modules:

  1. Building blocks of life.
    This initial part presents an introduction in primary DNA and protein sequence analysis. In this module it is explained what kind of information we can and cannot extract from a primary DNA and protein sequence.Topics include gene architecture, reading frames, intervening sequences, translation of a nucleotide sequence to protein, amino acids characteristics.

  2. Global analysis of DNA, mRNA, protein sequences: the “omics”.
    This second block presents an introduction in the technological and bioinformatics solutions for the analysis of genomics, transcriptomics and proteomics data. Topics include assembly and mapping of reads, differential expression analysis of transcripts, multiple testing correction, annotation of DNA and protein sequences using ontologies, high-throughput tandem mass spectrometry of peptides and proteins (LC-MS/MS), identification and quantification of proteins, use of a decoy database and calculation of false discovery rates.

  3. FAIR data and databases.
    This third block will give you a chance to learn about the best ways of searching and storing public information and your scientific data. Topics include FAIR (Findable, Accessible, Interoperable, and Reusable) data principles, exploring NCBI, EMBL, UniProt public sequence databases, searching PubMed, gaining and summarizing information.

  4. Individual sequence alignment.
    This part concerns methods used to learn more about a specific sequence. It dives into homology and similarity by looking at pairwise sequence alignment and basic sequence database search methods. Topics include PAM and BLOSUM matrices, BLAST algorithm for comparing primary biological sequence information, matrix derived raw-scores, bit-scores and E-values;

  5. Sequence-defined properties.
    This block presents the bioinformatics solutions to the problem of predicting protein cellular localization and how multiple sequence alignments help to elucidate the possible function(s) of novel proteins. Topics include standard tools for extraction of topological signals based on AI, identification of protein motif and domain by patterns, profiles and Hidden Markov Models, PSI-BLAST.

  6. Relationships among sequences.
    This part deals with how to define homology between sequences and how to determine their evolutionary relatedness. Topics include tools for the identification of orthologs and paralogs, phylogenetic trees construction and interpretation.

  7. Proteins: from sequence to structure to function.
    This block concerns proteins and what can be inferred from their primary, secondary and tertiary structure about their function. Topics include technologies to measure protein structure, tools to define energetically allowed regions within a structure, prediction of 3D structures by homology-modelling and AI methods and their alignment.

Learning outcomes

After successful completion of this course students are expected to be able to:

  • identify the amino acids characteristics relevant for protein structure and function;
  • explain the concepts behind widely used computational tools in bioinformatics (i.e. algorithms for DNA assembly, sequence alignment, translation into protein sequences, identification of protein motifs, topological signals and protein structure prediction;
  • recognize advantages and shortcomings of standardly used databases that store text, nucleotide and protein sequences;
  • illustrate advantages and shortcomings of common computational tools for DNA and protein sequence analysis, for topological signal and 3D protein structure prediction;
  • apply methods for DNA and protein sequence and structure analysis to (simple) real life biological problems;
  • assess applications and limitations of “omics” derived information with respect to the biological questions involved.

Prior knowledge

Assumed Knowledge:
Cell Biology I, Microbiology & Biochemistry, Gentechnology.

Resources

Additional information

  • Credits
    ECTS 6
  • Level
    bachelor
If anything remains unclear, please check the FAQ of Wageningen University.

Offering(s)

  • Start date

    10 March 2025

    • Ends
      2 May 2025
    • Term *
      Period 5
    • Location
      Wageningen
    • Instruction language
      English
    Enrolment open
For guests registration, this course is handled by Wageningen University