Responsible Instructor 

Instructor 

Contact Hours / Week x/x/x/x 
0/4/0/0 hc; 0/4/0/0 lab

Education Period 

Start Education 

Exam Period 

Course Language 

Course Contents 
The course introduces the concept of “big data”, a phrase that has quickly become a buzzword across industry and academia.
The course has two main complimentary areas of focus: (i) to teach the required technical background, and, (ii) to critically assess the data & engineering problems at hand. The first part of the course provides a general introduction and covers the difficulties of adapting classical algorithms – as taught in first year courses – to large amounts of data. Then, three main approaches that have been developed in the past decade to deal with big data in different scenarios are explored, both from a theoretical and a practical point of view:
(i) Streaming: the processing of data streams (large sequences of data items which may not be storable on disk in their entirety) is discussed with memory and processing time requirements.
(ii) MapReduce: it is assumed that all data can be stored on a cluster and that processing time is not of critical importance. The MapReduce framework was originally developed to simplify data processing in this setting; its most widelyused implementation in industry (Hadoop) is covered in detail.
(iii) Iterative algorithms: certain algorithms require a number of iterations before they arrive at a solution. In this setting, MapReduce is not the best solution, as it was not designed with iterative algorithms in mind; an alternative is introduced.

Study Goals 
[introduction; assess big data problems]  Explain the different dimensions of big data problems  Explain why classical algorithms fail on many big data problems  Given a practical problem, analyze to what extent it requires big data solutions (as opposed to classical solutions)  Given a practical problem, identify which big data approach is most applicable
[streaming]  Describe in which scenario streaming algorithms are most applicable  Explain the streaming algorithms introduced in the lectures  Identify the correct streaming algorithm for a given streaming problem
[MapReduce]  Explain the difference between OOprogramming and functional programming  Explain the major components of the Hadoop framework  Demonstrate understanding of the interplay between the different Hadoop components  Create Hadoopbased algorithms for novel (unseen) practical problems  Analyze MapReduce algorithms for their feasibility in practice; be able to propose improvements/alternatives
[iterative algorithms]  Explain the difference between iterative/noniterative algorithms  Describe the reasons for MapReduce’s lack of suitability for iterative algorithms  Given an iterative algorithm, be able to create a MapReduce algorithm and evaluate its problems  Design iterative algorithms for simple practical problems

Education Method 
The course is based on lectures, lab sessions and homework.

Literature and Study Materials 
Slides are provided.

Assessment 
The assessment is based on practical assignments (homework and lab sessions) and as well as the final exam. Weekly quizzes can yield up to one bonus grade point.

Permitted Materials during Tests 
None.

Judgement 
The assignments form 25% of the final grade and the exam 75%. The resit covers only the exam. It is not possible to resit the lab assignments. Partial grades (assignments or exam) cannot be transferred to subsequent years.
