About this course
In this course students will learn parallel (more speciﬁc, MapReduce) algorithms for analyzing massive networks. MapReduce algorithms distribute the node and connection data of the network to a big cluster of machines having moderate memory (where the entire network does not ﬁt in the memory of a single machine) and process the network data, but keeping the communication among the machines as low as possible. The exact contents of the course still needs to be decided, but a rough outline could be as follows: We ﬁrst learn the MapReduce architecture and its connection to other parallel models including PRAM, CONGESTED and LOCAL models. Then we study how classical graph algorithms can be implemented in the MapReduce model. Examples of classical problems that may be treated are matching problem, connectivity problem, data clustering, submodular maximization.
After successfully ﬁnishing the course, the student
- understands de facto architectures of modern parallel computing including MapReduce, PRAM, CONGESTED and LOCAL models.
- knows how to use parallel architectures to solve various basic problems on massive networks that do not ﬁt in the memory of a single machine.
- knows basic parallel techniques (e.g sampling, ﬁltering, coresets, and sketches) for solving classical problems in computer science and data science, including matching, connectivity, clustering problems and greedy algorithms.
- knows how to analyze parallel algorithms in models like the MapReduce Model and prove the correctness of these algorithms.