About this course
Content
Regression techniques are widely used to quantify the relationship between two or more variables. In data science, linear and logistic regression are common and powerful techniques for evaluating such relations. These techniques are only useful, however, once you understand when and how to apply them. In this course, students will learn how to apply linear and logistic regression with the R statistical software package.
This course will introduce students to the principles of analytical data science, linear and logistic regression, and the basics of statistical learning. Students will develop fundamental R programming skills and will gain experience with tidyverse: visualize data with ggplot2 and performing basic data wrangling with dplyr. This course helps prepare students for an entry-level research career (e.g. junior researcher or research assistant) or further education in research (e.g., a [research] Master program or a PhD).
Course Structure
In eight weeks, you will learn the basics of data handling and statistical programming with R and details about regression techniques in the context of statistical inference, prediction, and classification. Each week will comprise three class activities:
- During the weekly lectures, we will cover the theoretical content.
- Weekly practical exercises connect the statistical theory to practice by applying the lecture content in the R statistical programming language.
- During the weekly workgroup meetings, you will work on real-world data analysis with a group of your peers.
This course has multiple in-person meetings per week where your attendance is required. This course is not suited to follow online or by self-study only.
Registration Note that you need to register for this course during the SIRIS student.
Entrance requirements
The required entrance level is familiarity with correlation and regression, comparing means and cross tabulations of categorical variables. We also expect that you have hands on experience in carrying out these analyses with, for example, SPSS, Stata, R, JASP or SAS.
Students who cannot comply with the entrance requirements mentioned are advised to take the pre-course for the ADS minor ADS: Basis van Onderzoeksmethoden en Statistiek (code 201900025, Dutch taught). Students that cannot comply with entrance requirements, but believe to have the necessary background and skills are asked to provide further information on their eligibility. The course coordinator will decide on their eligibility.
Learning outcomes
Course goals
At the end of this course, students are able to:
-
Identify key statistical concepts such as:
-
(Conditional) probability
-
Inference
-
Estimation
-
Prediction
-
Classification
-
Sampling variability
-
Statistical modeling
-
Residuals
-
Fitted values
-
Choose an appropriate regression model for a given research scenario.
-
Explain the differences/similarities between statistical inference and model-based prediction/classification; give examples of each type of problem.
-
Identify the assumptions of linear and logistic regression; describe the consequences of violating these assumptions.
-
Describe the three components of a generalized linear model and how these components are specified in logistic regression.
-
Interpret the estimates from linear and logistic regression models, and use these estimates to answer research questions.
-
Use the R statistical software platform to perform basic statistical programming, data manipulation, data visualization, and basic data wrangling.
-
Use the R statistical software platform to perform, interpret, and evaluate linear and logistic regression analyses on real-world data.
-
Interpret R output and use the results to answer research questions.
-
Use R Markdown to document the results of a statistical analysis.
Relation between assessment and objective
In this course, skills and knowledge are evaluated with two types of assignment.
-
Identify key statistical concepts such as:
-
(Conditional) probability
-
Inference
-
Estimation
-
Prediction
-
Classification
-
Sampling variability
-
Statistical modeling
-
Residuals
-
Fitted values
-
Choose an appropriate regression model for a given research scenario.
-
Explain the differences/similarities between statistical inference and model-based prediction/classification; give examples of each type of problem.
-
Identify the assumptions of linear and logistic regression; describe the consequences of violating these assumptions.
-
Describe the three components of a generalized linear model and how these components are specified in logistic regression.
-
Interpret the estimates from linear and logistic regression models, and use these estimates to answer research questions.
-
Use the R statistical software platform to perform basic statistical programming, data manipulation, data visualization, and basic data wrangling.
-
Use the R statistical software platform to perform, interpret, and evaluate linear and logistic regression analyses on real-world data.
-
Interpret R output and use the results to answer research questions.
-
Use R Markdown to document the results of a statistical analysis.
Prior knowledge
You must meet the following requirements
Resources
- Literature Parts from the freely available text: Dalpiaz (2022) Applied Statistics with R https://book.stat420.org/
- Literature Parts from the freely available text: Wickham. R for Data Science (2016). O’Reilly. https://r4ds.hadley.nz/
- Literature Additional literature and references are provided during the course
- Software All software used (Rstudio, R) is open source and freely available online, as is the mandatory literature.
Additional information
- More infoCoursepage on website of Utrecht University
- Contact a coordinator
- CreditsECTS 7.5
- Levelbachelor