Date | Topic | Content | Presenter |
08/21 | Introduction | Course Introduction and Logistics Introduction to Part 1: Data Cleaning and ML |
Xu Chu |
08/23 | Introduction | Introduction to Part 2: Data Exploration and ML Introduction to Part 3: Systems and ML Discussions of sample course projects |
Xu Chu |
Optional reading | Introduction | Data
Management Challenges in Production Machine Learning Data Management in Machine Learning: Challenges, Techniques, and Systems |
N.A. |
08/28 | N.A. | No Class (Instructor at VLDB) | N.A. |
08/30 | N.A. | No Class (Instructor at VLDB) | N.A. |
Part I: Data Cleaning and ML | |||
09/04 | ML for Data Deduplication (1) |
Introduction to papers in this class Interactive Deduplication using Active Learning On active learning of record matching packages. |
Xu Chu Xiang Cheng Alex Mueller |
09/06 | ML for Data Deduplication (2) |
Introduction to papers in this class Distributed Representations of Tuples for Entity Resolution Deep Learning for Entity Matching: A Design Space Exploration |
Xu Chu Yuhong Wang Omar Sharifali |
09/11 | ML for Data Deduplication (3) | Introduction to papers in this class CrowdER: crowdsourced entity resolution Distributed Data Deduplication |
Xu Chu Alex Mueller Thibaut Boissin |
Optional reading | ML for Data Deduplication (4) |
Duplicate Record Detection: A Survey | N.A. |
09/13 | ML for Data Cleaning
(1) |
Introduction to papers in this class Detecting Data Errors: Where are we and what needs to be done? HoloClean: Holistic Data Repairs with Probabilistic Inference |
Xu Chu Florina Dutt Zhuoran Yu |
09/18 | ML for Data Cleaning
(2) |
Introduction to papers in this class Tracing Data Errors with View-Conditioned Causality∗ Data X-Ray: A Diagnostic Tool for Data Errors |
Xu Chu Thibaut Boissin Saurabh Sawlani |
Optional reading | ML for Data Cleaning
(3) |
Data Cleaning is a ML Problem that Needs Data Systems Help | N.A. |
09/20 | Data Cleaning for ML
(1) |
Introduction to papers in this class A Sample-and-Clean Framework for Fast and Accurate Query Processing on Dirty Data ActiveClean: Interactive Data Cleaning For Statistical Modeling |
Xu Chu Sanya Chaba Peng Li |
09/25 | Data Cleaning for ML
(2) |
Introduction to papers in this class Cleaning Crowdsourced Labels Using Oracles For Supervised Learning BoostClean: Automated Error Detection and Repair for Machine Learning |
Xu Chu Jennifer Blase Xinran Shi |
Optional reading | Data Cleaning for ML
(3) |
Impacts of Dirty Data: an Experimental Evaluation | N.A. |
09/27 | Data Wrangling/Transformation | Introduction to papers in this class Potter’s Wheel: An Interactive Data Cleaning System Transform-Data-by-Example (TDE): Extensible Data Transformation using Functions |
Xu Chu Matthew Britton Visweswara Dintyala |
10/02 | Training Data Enrichment | Introduction to papers in this class Snorkel: Rapid Training Data Creation with Weak Supervision Combining Labeled and Unlabeled Data with Co-Training |
Xu Chu Pranshu Trivedi Peng Li |
10/04 | Boosting |
Introduction to papers in this class Multi-class AdaBoost XGBoost: A Scalable Tree Boosting System |
Xu Chu Jayant Prakash Zhanhao Liu |
Part 2: Data Exploration and ML | |||
10/09 | N.A. | No Class (Fall Recess) | N.A. |
10/11 | Relational Data
Profiling (1) |
Introduction to papers in this class TANE: An Efficient Algorithm for Discovering Functional and Approximate Dependencies FastFDs: A Heuristic-Driven, Depth-First Algorithm for Mining Functional Dependencies from Relation Instances |
Xu Chu Xiang Cheng Nilaksh Das |
10/16 | Relational Data
Profiling (2) |
Introduction to papers in this class Discovering Denial Constraints Efficient Denial Constraint Discovery with Hydra |
Xu Chu Yuhong Wang Xinran Shi |
Optional reading | Relational Data
Profiling (3) |
Functional
Dependency Discovery: An Experimental Evaluation of
Seven Algorithms Profiling Relational Data – A Survey |
N.A. |
10/18 | Model Interpretation
(1) |
Introduction to papers in this class “Why Should I Trust You?” Explaining the Predictions of Any Classifier Anchors: High-Precision Model-Agnostic Explanations |
Xu Chu Yafei Zhang Saurabh Sawlani |
10/23 | Model Interpretation
(2) |
Introduction to papers in this class A Unified Approach to Interpreting Model Predictions Peeking Inside the Black Box: Visualizing Statistical Learning with Plots of Individual Conditional Expectation. |
Xu Chu Yue Hu Yafei Zhang |
Optional reading | Model Interpretation
(3) |
Interpretable ML
Symposium Interpretable ML by H2O |
N.A. |
Optional reading | Visualization and ML
(1) |
Visual
Exploration of Machine Learning Results using Data Cube
Analysis ACTIVIS: Visual Exploration of Industry-Scale Deep Neural Network Models |
N.A. |
Optional reading | Visualization and ML
(2) |
Recent progress and trends in predictive visual analytics | N.A. |
10/25 | Feature Engineering (1)
|
Introduction to papers in this class Deep Feature Synthesis: Towards Automating Data Science Endeavors ExploreKit: Automatic Feature Generation and Selection |
Xu Chu Jayant Prakash Andrea Hu |
10/30 | Feature Engineering (2)
|
Introduction to papers in this class One button machine for automating feature engineering in relational databases Materialization Optimizations for Feature Selection Workloads |
Xu Chu Wendi Du Yue Zhang |
Optional reading | Feature Engineering (3)
|
Discover
Feature Engineering, How to Engineer Features and How to
Get Good at It An Introduction to Variable and Feature Selection |
N.A. |
Part 3: Systems and ML | |||
11/01 | Managing ML Pipeline
(1) |
Introduction to papers in this class TFX: A TensorFlow-Based Production-Scale Machine Learning Platform MODELDB: A System for Machine Learning Model Management |
Xu Chu Jennifer Blase Visweswara Dintyala |
11/06 | Managing ML Pipeline
(2) |
Introduction to papers in this class ProvDB: A System for Lifecycle Management of Collaborative Analysis Workflows Automating Large-Scale Data Quality Verification |
Xu Chu Nidhi Menon Pranshu Trivedi |
Optional reading | Managing ML Pipeline
(3) |
A Berkeley View of Systems Challenges for AI | N.A. |
11/08 | Training Set Debug (1)
|
Introduction to papers in this class Training Set Debugging Using Trusted Items Flipper: A Systematic Approach to Debugging Training Sets |
Xu Chu Yue Zhang Sneha Venkatachalam |
11/13 | Training Set Debug (2) | Introduction to papers in this class Understanding Black-box Predictions via Influence Functions Examples are not Enough, Learn to Criticize! Criticism for Interpretability |
Xu Chu Nidhi Menon Yue Hu |
11/15 | Reducing Training Set |
Introduction to papers in this class Efficient Construction of Approximate Ad-Hoc ML models Through Materialization and Reuse BlinkML: Approximate Machine Learning with Probabilistic Guarantees |
Xu Chu Eric Qin Zhanhao Liu |
11/20 | Reducing Training Set | Introduction to papers in this class ACTIVE LEARNING FOR CONVOLUTIONAL NEURAL NETWORKS: A CORE-SET APPROACH DCNNs on a Diet: Sampling Strategies for Reducing the Training Set Size |
Xu Chu Eric Qin Zhuoran Yu |
Optional reading | Debug ML |
Why is machine learning 'hard' | N.A. |
Course Project Presentations | |||
11/22 | N.A. | No Class (Thanksgiving) | N.A. |
11/27 | Final Project | 1) Eric Qin 2) Sanya Chaba, Nilaksh Das 3) Omar Sharifali, Saurabh Sawlani 4) Xiang Cheng |
N.A. |
11/29 | Final Project | 5) Alex Mueller, Jayant Prakash 6) Andrea Hu, Matthew Britton 7) Sneha Venkatachalam, Nidhi Menon 8) Yue Zhang, Jennifer Blase, Peng Li 9) Wendi Du, Yafei Zhang, Yue Hu |
N.A. |
12/04 | Final Project | 10) Florina Dutt 11) Zhanhao Liu, Yuhong Wang 12) Pranshu Trivedi, Zhuoran Yu 13) Thibaut Boissin, Visweswara Dintyala 14) Xinran Shi |
N.A. |
© Xu Chu 2018