Data Management and Machine Learning

CS 8803 DML: Data Management and Machine Learning (Fall 2018)

Announcements:

Nov 6: For each group, please email TA your preferences for the three project presentation days by 9 pm EST Nov 11 (This Sunday).
Oct 9: The feedback and score for the project proposal had been sent via email.
Major course announcements will be made available here and on GT Canvas

Motivation

Big data processing poses many challenges, which are often characterized by the three V's (volume, velocity, and variety). On the other hand, machine learning is increasingly used by all kinds of data-driven applications. This course explores the interactions between these two exciting fields. This blogpost provides one perspective of such interactions.

Topics

Because of the purpose above, the course will be covered topics broadly categorized as follows:

Utilizing machine learning technologies to solve hard data management challenges, such as data cleaning
Utilizing data management technologies to solve hard machine learning challenges, such as model interpretation, debugging, and feature engineering.

Objectives

The course covers a wide range of moder challenges and sub-topics in both data management and machine learning. The students will get familiar with these sub-topics, and gain a deep understanding of one sub-topic by doing presentations and course projects.

Furthermore, since this is a graduate seminar, another important objective is to train students to master basic skills for being a researcher. The course will create a number of opportunities for students to learn how to read a paper, how to write a paper review, how to give a research talk, and how to write a research paper.

Logistics

We will be using Canvas for course announcements, uploading materials that should not be made public, and student discussions such as forming project groups.

Instructor: Xu Chu

Email: xu.chu@cc.gatech.edu

TA: Zhanhao Liu

Email: zhanhao.liu@gatech.edu

Time: Tue and Thur, 4:30 - 5:45pm
Location: Van Leer C456
Office Hours: By appointment. E-mail me to book a slot. The title of the email should always starts with "CS 8803 DML".

Prerequisites

Students should have basic understandings of data analytics and machine learning. Though not required, an undergraduate course in relational database systems and an undergraduate course in machine learning would be helpful. References provide some relevant courses and materials.

Academic Honesty:

Students are expected to abide by the Georgia Tech Honor Code.

Grading

Presentations: 20%
Paper Review: 20%
Project: 60%

References