About this course
Big Data usually refers to data sets with sizes beyond the ability of commonly used software tools to capture, curate, manage, and process within a tolerable elapsed time. With the advancements in computing, the realization of Big Data systems has now become feasible and can trigger innovation and growth for various application domains.
This course will discuss both the key concepts of Big Data and provide hands-on-experience in developing and using Big Data systems. We introduce concepts related to Big Data system architectures, distributed systems, the Map-Reduce framework, scalable linear and machine learning models, and how they can be used with cutting-edge software platforms. Students will practice with tools via individual tutorials, and gain hands-on experience by working on a group project formed as a "data challenge". Students will not only demonstrate their skills achieved in the course, but also their creativity as data scientists, which includes communicating the value of their findings with visualization tools. The course has been designed in such a way that it is accessible for students of a diverse range of disciplines, like environmental sciences, biosystems engineering, bioinformatics, geo-information science and social sciences.
After successful completion of this course students are expected to be able to:
- understand the basic concepts related to Big Data and data-driven value-creation in the environmental, social and life sciences;
- apply Big Data methods for designing scalable applications;
- analyse the role of various tools in the Big Data ecosystem and have hands-on experience with some of them;
- build a Big Data system for a real-world life science application;
- explore data analytics for discovery, and data visualization for communication of meaningful patterns in data;
- determine the value of data-driven innovation, and associate it with their own course of studies.
Fundamentals of programming (e.g. INF22306 Programming in Python).
Specifically you should be acquainted with the following concepts and techniques:
- variables, assignment, expressions, operators;
- functions (and/or procedures, subroutines, methods) and parameters; also making your own functions;
- control structures: at least: if, for, while;
- objects and their properties (fields, variables) and operations (methods);
- arrays, including standard algorithms to traverse arrays (searching, summing, finding the largest element, etc.);
- data structure (lists, tuples, dictionaries);
- familiarity with relational databases (e.g. INF21306 Data Management).