CS 8803 DML: Data Management and Machine Learning (Fall 2018)





Date Topic Content Presenter
08/21 Introduction Course Introduction and Logistics
Introduction to Part 1: Data Cleaning and ML
Xu Chu
08/23 Introduction Introduction to Part 2: Data Exploration and ML
Introduction to Part 3: Systems and ML
Discussions of sample course projects
Xu Chu
Optional reading Introduction Data Management Challenges in Production Machine Learning
Data Management in Machine Learning: Challenges, Techniques, and Systems
N.A.
08/28 N.A. No Class (Instructor at VLDB) N.A.
08/30 N.A. No Class (Instructor at VLDB) N.A.
Part I: Data Cleaning and ML
09/04 ML for Data Deduplication (1)
Introduction to papers in this class
Interactive Deduplication using Active Learning
On active learning of record matching packages.
Xu Chu
Xiang Cheng
Alex Mueller
09/06 ML for Data Deduplication (2)
Introduction to papers in this class
Distributed Representations of Tuples for Entity Resolution
Deep Learning for Entity Matching: A Design Space Exploration
Xu Chu
Yuhong Wang
Omar Sharifali
09/11 ML for Data Deduplication (3) Introduction to papers in this class
CrowdER: crowdsourced entity resolution
Distributed Data Deduplication
Xu Chu
Alex Mueller
Thibaut Boissin
Optional reading ML for Data Deduplication (4)
Duplicate Record Detection: A Survey N.A.
09/13 ML for Data Cleaning (1)
Introduction to papers in this class
Detecting Data Errors: Where are we and what needs to be done?
HoloClean: Holistic Data Repairs with Probabilistic Inference
Xu Chu
Florina Dutt
Zhuoran Yu
09/18 ML for Data Cleaning (2)
Introduction to papers in this class
Tracing Data Errors with View-Conditioned Causality∗
Data X-Ray: A Diagnostic Tool for Data Errors
Xu Chu
Thibaut Boissin
Saurabh Sawlani
Optional reading ML for Data Cleaning (3)
Data Cleaning is a ML Problem that Needs Data Systems Help N.A.
09/20 Data Cleaning for ML (1)
Introduction to papers in this class
A Sample-and-Clean Framework for Fast and Accurate Query Processing on Dirty Data
ActiveClean: Interactive Data Cleaning For Statistical Modeling
Xu Chu
Sanya Chaba
Peng Li
09/25 Data Cleaning for ML (2)
Introduction to papers in this class
Cleaning Crowdsourced Labels Using Oracles For Supervised Learning
BoostClean: Automated Error Detection and Repair for Machine Learning
Xu Chu
Jennifer Blase
Xinran Shi
Optional reading Data Cleaning for ML (3)
Impacts of Dirty Data: an Experimental Evaluation N.A.
09/27 Data Wrangling/Transformation Introduction to papers in this class
Potter’s Wheel: An Interactive Data Cleaning System
Transform-Data-by-Example (TDE): Extensible Data Transformation using Functions
Xu Chu
Matthew Britton
Visweswara Dintyala
10/02 Training Data Enrichment Introduction to papers in this class
Snorkel: Rapid Training Data Creation with Weak Supervision
Combining Labeled and Unlabeled Data with Co-Training
Xu Chu
Pranshu Trivedi
Peng Li
10/04 Boosting
Introduction to papers in this class
Multi-class AdaBoost
XGBoost: A Scalable Tree Boosting System
Xu Chu
Jayant Prakash
Zhanhao Liu
Part 2: Data Exploration and ML
10/09 N.A. No Class (Fall Recess) N.A.
10/11 Relational Data Profiling (1)
Introduction to papers in this class
TANE: An Efficient Algorithm for Discovering Functional and Approximate Dependencies
FastFDs: A Heuristic-Driven, Depth-First Algorithm for Mining Functional Dependencies from Relation Instances
Xu Chu
Xiang Cheng
Nilaksh Das
10/16 Relational Data Profiling (2)
Introduction to papers in this class
Discovering Denial Constraints
Efficient Denial Constraint Discovery with Hydra
Xu Chu
Yuhong Wang
Xinran Shi
Optional reading Relational Data Profiling (3)
Functional Dependency Discovery: An Experimental Evaluation of Seven Algorithms
Profiling Relational Data – A Survey
N.A.
10/18 Model Interpretation (1)
Introduction to papers in this class
“Why Should I Trust You?” Explaining the Predictions of Any Classifier
Anchors: High-Precision Model-Agnostic Explanations
Xu Chu
Yafei Zhang
Saurabh Sawlani
10/23 Model Interpretation (2)
Introduction to papers in this class
A Unified Approach to Interpreting Model Predictions
Peeking Inside the Black Box: Visualizing Statistical Learning with Plots of Individual Conditional Expectation.
Xu Chu
Yue Hu
Yafei Zhang
Optional reading Model Interpretation (3)
Interpretable ML Symposium
Interpretable ML by H2O
N.A.
Optional reading Visualization and ML (1)
Visual Exploration of Machine Learning Results using Data Cube Analysis
ACTIVIS: Visual Exploration of Industry-Scale Deep Neural Network Models
N.A.
Optional reading Visualization and ML (2)
Recent progress and trends in predictive visual analytics N.A.
10/25 Feature Engineering (1)
Introduction to papers in this class
Deep Feature Synthesis: Towards Automating Data Science Endeavors
ExploreKit: Automatic Feature Generation and Selection
Xu Chu
Jayant Prakash
Andrea Hu
10/30 Feature Engineering (2)
Introduction to papers in this class
One button machine for automating feature engineering in relational databases
Materialization Optimizations for Feature Selection Workloads
Xu Chu
Wendi Du
Yue Zhang
Optional reading Feature Engineering (3)
Discover Feature Engineering, How to Engineer Features and How to Get Good at It
An Introduction to Variable and Feature Selection
N.A.
Part 3: Systems and ML
11/01 Managing ML Pipeline (1)
Introduction to papers in this class
TFX: A TensorFlow-Based Production-Scale Machine Learning Platform
MODELDB: A System for Machine Learning Model Management
Xu Chu
Jennifer Blase
Visweswara Dintyala
11/06 Managing ML Pipeline (2)
Introduction to papers in this class
ProvDB: A System for Lifecycle Management of Collaborative Analysis Workflows
Automating Large-Scale Data Quality Verification
Xu Chu
Nidhi Menon
Pranshu Trivedi
Optional reading Managing ML Pipeline (3)
A Berkeley View of Systems Challenges for AI N.A.
11/08 Training Set Debug (1)
Introduction to papers in this class
Training Set Debugging Using Trusted Items
Flipper: A Systematic Approach to Debugging Training Sets
Xu Chu
Yue Zhang
Sneha Venkatachalam
11/13 Training Set Debug (2) Introduction to papers in this class
Understanding Black-box Predictions via Influence Functions
Examples are not Enough, Learn to Criticize! Criticism for Interpretability
Xu Chu
Nidhi Menon
Yue Hu
11/15 Reducing Training Set
Introduction to papers in this class
Efficient Construction of Approximate Ad-Hoc ML models Through Materialization and Reuse
BlinkML: Approximate Machine Learning with Probabilistic Guarantees
Xu Chu
Eric Qin
Zhanhao Liu
11/20 Reducing Training Set Introduction to papers in this class
ACTIVE LEARNING FOR CONVOLUTIONAL NEURAL NETWORKS: A CORE-SET APPROACH
DCNNs on a Diet: Sampling Strategies for Reducing the Training Set Size
Xu Chu
Eric Qin
Zhuoran Yu
Optional reading Debug ML
Why is machine learning 'hard' N.A.
Course Project Presentations
11/22 N.A. No Class (Thanksgiving) N.A.
11/27 Final Project 1) Eric Qin
2) Sanya Chaba, Nilaksh Das
3) Omar Sharifali, Saurabh Sawlani
4) Xiang Cheng
N.A.
11/29 Final Project 5) Alex Mueller, Jayant Prakash
6) Andrea Hu, Matthew Britton
7) Sneha Venkatachalam, Nidhi Menon
8) Yue Zhang, Jennifer Blase, Peng Li
9) Wendi Du, Yafei Zhang, Yue Hu
N.A.
12/04 Final Project 10) Florina Dutt
11) Zhanhao Liu, Yuhong Wang
12) Pranshu Trivedi, Zhuoran Yu
13) Thibaut Boissin, Visweswara Dintyala
14) Xinran Shi
N.A.
Disclaimer: The schedule could be subject to change as the semester progresses.

 


  © Xu Chu 2018