About this course
Big Data usually refers to data sets with sizes beyond the ability of commonly used software tools to capture, curate, manage, and process within a tolerable elapsed time. With the advancements in computing, the realization of Big Data systems has now become feasible and can trigger innovation and growth for various application domains.
This course will discuss both the key concepts of Big Data and provide hands-on-experience in developing and using Big Data systems. We introduce concepts related to Big Data system architectures, distributed systems, the Map-Reduce framework, scalable linear and machine learning models, and how they can be used with cutting-edge software platforms. Students will practice with tools via individual tutorials, and gain hands-on experience by working on a group project formed as a "data challenge". Students will not only demonstrate their skills achieved in the course, but also their creativity as data scientists, which includes communicating the value of their findings with visualization tools. The course has been designed in such a way that it is accessible for students of all disciplines in life and social sciences, for example food and health, biosystems engineering, bioinformatics, geo-information science, environmental science and plant science, amongst others.
Learning outcomes
- Understand the basic concepts related to Big Data and data-driven value-creation in the environmental, social and life sciences. 
- Apply various tools in the big data ecosystem for handling big data. 
- Design a big data application derived from a big data reference architecture through requirement analysis and basic software modelling. 
- Build a big data system for a real-world life science application implementing scalable descriptive and predictive data analytics techniques. 
- Communicate meaningful patterns in data through data visualisation, reporting, presentation and documentation. 
- Determine the value of data-driven innovation, and associate it with their own course of studies. 
Assessment method
- Written test with open and closed questions (50%) 2 intermediate-tests (which require an average of 5.5 to pass).
- Assignment other (50%)
Prior knowledge
Fundamentals of programming (e.g. INF22306 Programming in Python).
Specifically you should be acquainted with the following concepts and techniques:
- 
variables, assignment, expressions, operators; 
- 
functions (and/or procedures, subroutines, methods) and parameters; also making your own functions; 
- 
control structures: at least: if, for, while; 
- 
data structure (lists, tuples, dictionaries); 
- 
libraries for data manipulation and visualization (Numpy, Pandas and Matplotlib) 
Familiarity with relational databases (e.g. INF21306 Data Management) is of added value.
Resources
- Recent scientific literature (a collection of papers) will be made available (at no cost through the library).
Additional information
- More infoCourse page on website of Wageningen University & Research
- Contact a coordinator
- Levelmaster
- Mode of instructionon campus
