TU Delft
print this page print this page     
2016/2017 Electrical Engineering, Mathematics and Computer Science Bachelor Computer Science and Engineering
Data Mining
Responsible Instructor
Name E-mail
T.E.P.M.F. Abeel    T.Abeel@tudelft.nl
Name E-mail
Z. Erkin    Z.Erkin@tudelft.nl
Prof.dr.ir. M.J.T. Reinders    M.J.T.Reinders@tudelft.nl
Contact Hours / Week x/x/x/x
0/0/4/0 hc; 0/0/4/0 lab
Education Period
Start Education
Exam Period
Course Language
Expected prior knowledge
Recommended prerequisite courses: TI1200 OO Programmeren, TI1310 Algoritmen en Datastructuren, TI2300 Algoritmiek, WI1200-TI Lineaire Algebra, TI1500 Web- en Databasetechnologie, and the two prior courses in the "variantblok".

Specific topics that are assumed as prior knowledge include:
• Discrete mathematics: set intersections, unions, and differences.
• Linear algebra: matrix multiplication, linear systems, SVD, and eigendecompositions.
• Probability and statistics: multivariate Gaussian distribution and correlation and covariance (matrices).
• Programming: Java or Scala programming skills.
• Data structures: arrays, linked lists, hash tables, and trees.
• Graph theory: bipartite graphs and shortest paths.
• Databases: natural, inner, and outer joins.
Course Contents
The goal of the course is to acquaint students with the main techniques for the mining of big data sets. Specifically, the course will cover algorithms for similar-item retrieval, frequent itemset mining, counting of events, network mining, privacy, clustering, classification, collaborative filtering, clustering, and dimension reduction.
Study Goals
Students will be able to…
Explain and implement common data mining algorithms
- counting algorithms for events in data streams
- PageRank algorithm
- frequent item-set mining
- unsupervised algorithms such as k-means and hierarchical clustering
- basics of social-network graph mining
- principal components analysis
Critically evaluate, motivate and apply datamining algorithms to real-world challenges
- decision trees and k-nearest neighbor classifiers
- algorithms for the retrieval of similar items
- collaborative filtering algorithms
- performance evaluation of classifiers and other predictive algorithms
Explain the relevance and impact of data mining in our lives
- explain relevance and context of application areas of datamining
- explain basic concepts of the privacy aspects of data mining
- explain and apply data visualization principles on toy examples
Education Method
The course comprises two lectures and one (four-hour) lab course per week.

The lab assignments are mandatory and must be handed in during the following lab session. They must be made in groups of two. They need to be shown to one of the TAs during the lab session. TAs will ask the students questions to confirm that the student understands the implemented algorithm. The lab assignments will comprise programming assignments of about four hours each.
Literature and Study Materials
The reading material comes from the book "Mining Massive Datasets" by Anand Rajaraman, Jure Leskovec, and Jeffrey Ullman. In particular, the student will have to read chapters 1, 3, 4, 5, 6, 7, 9, 10, 11, and 12.

In addition, the students will read the paper "A General Survey of Privacy-Preserving Data Mining Models and Algorithms" by Charu Aggarwal and Philip Yu. and the paper "A tour through the Visualization Zoo" by Heer et al.
Project: The course contains a data mining competition that will be run via Kaggle-in-Class. Students are expected to compete in the competition and to submit a small report on their experiments and results. The competition submissions and corresponding report will be graded, and form 40% of the final grade.

Exam: The remaining 60% of the grade will be determined by a closed-book written exam.

Lab-sessions: Lab-sessions are ungraded, but are mandatory to sign-off in person during specific lab hours. Each assignment has two hand-in opportunities. Each missed lab-session is -1 (minus one) on your final grade.

There is one resit per year for the closed-book written exam. There is no resit for the competition and the corresponding report. There is no reparation opportunity for missing the mandatory lab sessions.

Resit grade is determined by combined the resit exam with the original project score in the same 60-40 weighting.

The individual grade of the project, the individual grade of the exam and the attendance record of the lab-sessions are only valid for the current academic year. The final combined grade is valid indefinitely.

Permitted Materials during Tests
60% closed book written exam; 40% project (no resit); -1 on final grade for each missing lab submission or late project submission.
Name E-mail
Prof.dr.ir. M.J.T. Reinders    M.J.T.Reinders@tudelft.nl