About this course
Content
Regression techniques are widely used to quantify the relationship between two or more variables. In data science, linear and logistic regression are common and powerful techniques for evaluating such relations. These techniques are only useful, however, once you understand when and how to apply them. In this course, students will learn how to apply linear and logistic regression with the R statistical software package.
This course will introduce students to the principles of analytical data science, linear and logistic regression, and the basics of statistical learning. Students will develop fundamental R programming skills and will gain experience with tidyverse: visualize data with ggplot2 and performing basic data wrangling with dplyr. This course helps prepare students for an entrylevel research career (e.g. junior researcher or research assistant) or further education in research (e.g., a [research] Master program or a PhD).
Course Structure
In eight weeks, you will learn the basics of data handling and statistical programming with R and details about regression techniques in the context of statistical inference, prediction, and classification. Each week will comprise three class activities:
 During the weekly lectures, we will cover the theoretical content.
 Weekly practical exercises connect the statistical theory to practice by applying the lecture content in the R statistical programming language.
 During the weekly workgroup meetings, you will work on realworld data analysis with a group of your peers.
This course has multiple inperson meetings per week where your attendance is required. This course is not suited to follow online or by selfstudy only.
Registration Note that you need to register for this course during the SIRIS student.
Entrance requirements
The required entrance level is familiarity with correlation and regression, comparing means and cross tabulations of categorical variables. We also expect that you have hands on experience in carrying out these analyses with, for example, SPSS, Stata, R, JASP or SAS.
Students who cannot comply with the entrance requirements mentioned are advised to take the precourse for the ADS minor ADS: Basis van Onderzoeksmethoden en Statistiek (code 201900025, Dutch taught). Students that cannot comply with entrance requirements, but believe to have the necessary background and skills are asked to provide further information on their eligibility. The course coordinator will decide on their eligibility.
Learning outcomes
Course goals
At the end of this course, students are able to:

Identify key statistical concepts such as:

(Conditional) probability

Inference

Estimation

Prediction

Classification

Sampling variability

Statistical modeling

Residuals

Fitted values

Choose an appropriate regression model for a given research scenario.

Explain the differences/similarities between statistical inference and modelbased prediction/classification; give examples of each type of problem.

Identify the assumptions of linear and logistic regression; describe the consequences of violating these assumptions.

Describe the three components of a generalized linear model and how these components are specified in logistic regression.

Interpret the estimates from linear and logistic regression models, and use these estimates to answer research questions.

Use the R statistical software platform to perform basic statistical programming, data manipulation, data visualization, and basic data wrangling.

Use the R statistical software platform to perform, interpret, and evaluate linear and logistic regression analyses on realworld data.

Interpret R output and use the results to answer research questions.

Use R Markdown to document the results of a statistical analysis.
Relation between assessment and objective
In this course, skills and knowledge are evaluated with two types of assignment.

Identify key statistical concepts such as:

(Conditional) probability

Inference

Estimation

Prediction

Classification

Sampling variability

Statistical modeling

Residuals

Fitted values

Choose an appropriate regression model for a given research scenario.

Explain the differences/similarities between statistical inference and modelbased prediction/classification; give examples of each type of problem.

Identify the assumptions of linear and logistic regression; describe the consequences of violating these assumptions.

Describe the three components of a generalized linear model and how these components are specified in logistic regression.

Interpret the estimates from linear and logistic regression models, and use these estimates to answer research questions.

Use the R statistical software platform to perform basic statistical programming, data manipulation, data visualization, and basic data wrangling.

Use the R statistical software platform to perform, interpret, and evaluate linear and logistic regression analyses on realworld data.

Interpret R output and use the results to answer research questions.

Use R Markdown to document the results of a statistical analysis.
Prior knowledge
You must meet the following requirements
Resources
 Literature Parts from the freely available text: Dalpiaz (2022) Applied Statistics with R https://book.stat420.org/
 Literature Parts from the freely available text: Wickham. R for Data Science (2016). O’Reilly. https://r4ds.hadley.nz/
 Literature Additional literature and references are provided during the course
 Software All software used (Rstudio, R) is open source and freely available online, as is the mandatory literature.
Additional information
 More infoCoursepage on website of Utrecht University
 Contact a coordinator
 CreditsECTS 7.5
 Levelbachelor