TU Delft
print this page print this page     
2016/2017 Electrical Engineering, Mathematics and Computer Science Bachelor Computer Science and Engineering
Big Data Processing
Responsible Instructor
Name E-mail
Dr.ir. A. Bozzon    A.Bozzon@tudelft.nl
Dr. C. Hauff    C.Hauff@tudelft.nl
Name E-mail
Dr.ir. Z. Al-Ars    Z.Al-Ars@tudelft.nl
Contact Hours / Week x/x/x/x
0/4/0/0 hc; 0/4/0/0 lab
Education Period
Start Education
Exam Period
Course Language
Course Contents
The course introduces the concept of “big data”, a phrase that has quickly become a buzzword across industry and academia.

The course has two main complimentary areas of focus: (i) to teach the required technical background, and, (ii) to critically assess the data & engineering problems at hand.

The first part of the course provides a general introduction and covers the difficulties of adapting classical algorithms – as taught in first year courses – to large amounts of data. Then, three main approaches that have been developed in the past decade to deal with big data in different scenarios are explored, both from a theoretical and a practical point of view:

(i) Streaming: the processing of data streams (large sequences of data items which may not be storable on disk in their entirety) is discussed with memory and processing time requirements.

(ii) MapReduce: it is assumed that all data can be stored on a cluster and that processing time is not of critical importance. The MapReduce framework was originally developed to simplify data processing in this setting; its most widely-used implementation in industry (Hadoop) is covered in detail.

(iii) Iterative algorithms: certain algorithms require a number of iterations before they arrive at a solution. In this setting, MapReduce is not the best solution, as it was not designed with iterative algorithms in mind; an alternative is introduced.
Study Goals
[introduction; assess big data problems]
- Explain the different dimensions of big data problems
- Explain why classical algorithms fail on many big data problems
- Given a practical problem, analyze to what extent it requires big data solutions (as opposed to classical solutions)
- Given a practical problem, identify which big data approach is most applicable

- Describe in which scenario streaming algorithms are most applicable
- Explain the streaming algorithms introduced in the lectures
- Identify the correct streaming algorithm for a given streaming problem

- Explain the difference between OO-programming and functional programming
- Explain the major components of the Hadoop framework
- Demonstrate understanding of the interplay between the different Hadoop components
- Create Hadoop-based algorithms for novel (unseen) practical problems
- Analyze MapReduce algorithms for their feasibility in practice; be able to propose improvements/alternatives

[iterative algorithms]
- Explain the difference between iterative/non-iterative algorithms
- Describe the reasons for MapReduce’s lack of suitability for iterative algorithms
- Given an iterative algorithm, be able to create a MapReduce algorithm and evaluate its problems
- Design iterative algorithms for simple practical problems

Education Method
The course is based on lectures, lab sessions and homework.
Literature and Study Materials
Slides are provided.
The assessment is based on practical assignments (homework and lab sessions) and as well as the final exam.
Weekly quizzes can yield up to one bonus grade point.
Permitted Materials during Tests
The assignments form 25% of the final grade and the exam 75%.
The resit covers only the exam. It is not possible to resit the lab assignments. Partial grades (assignments or exam) cannot be transferred to subsequent years.