ADS: Fundamental techniques in data science with R

201900026

About this course

Content
Regression techniques are widely used to quantify the relationship between two or more variables. In data science, linear and logistic regression are common and powerful techniques for evaluating such relations. These techniques are only useful, however, once you understand when and how to apply them. In this course, students will learn how to apply linear and logistic regression with the R statistical software package.

This course will introduce students to the principles of analytical data science, linear and logistic regression, and the basics of statistical learning. Students will develop fundamental R programming skills and will gain experience with tidyverse: visualize data with ggplot2 and performing basic data wrangling with dplyr. This course helps prepare students for an entry-level research career (e.g. junior researcher or research assistant) or further education in research (e.g., a [research] Master program or a PhD).

Course Structure
In eight weeks, you will learn the basics of data handling and statistical programming with R and details about regression techniques in the context of statistical inference, prediction, and classification. Each week will comprise three class activities:

  1. During the weekly lectures, we will cover the theoretical content.
  2. Weekly practical exercises connect the statistical theory to practice by applying the lecture content in the R statistical programming language.
  3. During the weekly workgroup meetings, you will work on real-world data analysis with a group of your peers.

This course has multiple in-person meetings per week where your attendance is required. This course is not suited to follow online or by self-study only.

Registration Note that you need to register for this course during the OSIRIS student.
Students who cannot comply with the entrance requirements mentioned are advised to take the pre-course for the ADS minor ADS: Basis van Onderzoeksmethoden en Statistiek (code 201900025, Dutch taught). Students that cannot comply with entrance requirements, but believe to have the necessary background and skills are asked to provide further information on their eligibility. The course coordinator will decide on their eligibility.

Learning outcomes

Course goals
At the end of this course, students are able to:

  • Identify key statistical concepts such as:
    (Conditional) probability
    Inference
    Estimation
    Prediction
    Classification
    Sampling variability
    Statistical modeling
    Residuals
    Fitted values
  • Choose an appropriate regression model for a given research scenario.
  • Explain the differences/similarities between statistical inference and model-based prediction/classification; give examples of each type of problem.
  • Identify the assumptions of linear and logistic regression; describe the consequences of violating these assumptions.
  • Describe the three components of a generalized linear model and how these components are specified in logistic regression.
  • Interpret the estimates from linear and logistic regression models, and use these estimates to answer research questions.
  • Use the R statistical software platform to perform basic statistical programming, data manipulation, data visualization, and basic data wrangling.
  • Use the R statistical software platform to perform, interpret, and evaluate linear and logistic regression analyses on real-world data.
  • Interpret R output and use the results to answer research questions.
  • Use R Markdown to document the results of a statistical analysis.Relation between assessment and objective
    In this course, skills and knowledge are evaluated with two types of assignment.
  1. The exam evaluates knowledge and understanding of both statistical concepts (learning goal 1) and fundamental R concepts (learning goal 7), the ability to critically evaluate research problems and statistical methods (learning goals 2–5), and the ability to interpret statistical results and software output and apply these interpretations (learning goals 6 & 9).
  2. The group assignments evaluate the student’s ability to work with data, solve basic data analytic problems, execute quantitative data analyses on real-world data sets, and document the results (learning goals 6–10).

Prior knowledge

You must meet the following requirements

Resources

  • Literature Parts from the freely available text: Dalpiaz (2022) Applied Statistics with R https://book.stat420.org/
  • Literature Parts from the freely available text: Wickham. R for Data Science (2016). O’Reilly. http://r4ds.had.co.nz/
  • Literature Additional literature and references are provided during the course
  • Software All software used (Rstudio, R) is open source and freely available online, as is the mandatory literature.

Additional information

  • Credits
    ECTS 7.5
  • Level
    bachelor
If anything remains unclear, please check the FAQ of Utrecht University.

Offering(s)

  • Start date

    11 November 2024

    • Ends
      31 January 2025
    • Term *
      Period 2
    • Location
      Utrecht
    • Instruction language
      English
    Enrolment period closed
For guests registration, this course is handled by Utrecht University