About this course
What puts former criminals on the right track? How can we prevent heart disease? Can Twitter predict election outcomes? What does a violent brain look like? How many social classes does 21st century society have? Are hospitals spending too much on health care, or too little?
Data analysis is the art and science of tackling questions like these by looking at data. Just as cartographers make maps to see what a country looks like, data analysts explore the hidden structures of data by creating informative pictures and summarizing relationships among variables. And just as doctors diagnose sick patients and advise healthy ones on how to stay healthy, data analysts predict important events and variables so we can act on this knowledge. Methods from statistics, machine learning, and data mining play an important part in this process, as well as visualizations that allow the analyst and other humans to better understand what we can conclude from the available facts.
During this course, you will actively learn how to apply the main statistical methods in data analysis and how to use machine learning algorithms and visualizing techniques. The course will go beyond linear and logistic regression, and thus continue where “Fundamental techniques in data science with R” ended. The course has a strongly practical, hands-on focus: rather than focusing on the mathematics and background of the discussed techniques, you will gain hands on experience in using them on real data during the course and interpreting the results.
This course covers both classical and modern topics in data analysis and visualization:
- Exploratory data analysis (EDA);
- Supervised machine learning and statistical learning;
- Basic unsupervised learning techniques;
- Visualization (throughout the course).
Note that you need to register for this course during the OSIRIS student . Also note that this course builds on the course Fundamental techniques in data science with R (course code: 201900026).
Students who cannot comply with the general entrance requirements mentioned (see below) are advised to take the pre-course for the ADS minor ADS: Basis van Onderzoeksmethoden en Statistiek (code 201900025, Dutch taught). Students that cannot comply with entrance requirements, but believe to have the necessary background and skills are asked to provide further information on their eligibility. The course coordinator will decide on their eligibility.
Entry requirements
Students should have at least followed an introductory statistics course of 7.5 EC, and familiarity with correlation and regression, comparing means and cross tabulations of categorical variables. We also expect that you have hands on experience in carrying out these analyses, with, for example, SPSS, Stata, R or SAS.
Learning outcomes
This course builds on the course Fundamental techniques in data science with R (course code: 201900026)This course builds on the course Fundamental techniques in data science with R (course code: 201900026).
After successfully completing this course, you will be able to:
- Understand and explain the different approaches to data analysis that go beyond regression analysis;
- Given a practical data science problem, select appropriate techniques to tackle this problem;
- Apply various (supervised) data analysis techniques, including regression, trees, classification, clustering, etc. in R;
- Implement generic Data Science tools such as train/validation/test sets, crossvalidation, and error evaluation in R;
- Interpret and evaluate the results of such analyses;
- Explain these evaluations in layman's terms;
- Understand and explain the basic principles of data visualization and the grammar of graphics;
- Construct appropriate visualizations in connection with each of the data analysis techniques in R.
Prior knowledge
You must meet the following requirements
Resources
- Book Data Visualization - A practical introduction
- Book Text Mining with R - A Tidy Approach
- Book An introduction to statistical learning with applications in R
- Software All software used (Rstudio, R) is open source and freely available online, as is the mandatory literature.
- Literature R for Data Science
- Book Practical data science with R
- Literature Additional literature and references are provided during the course
Additional information
- More infoCoursepage on website of Utrecht University
- Contact a coordinator
- CreditsECTS 7.5
- Levelbachelor