Learning from Data (Spring 2024)

Welcome to the class website of Learning from Data!

News

2024-6-11: Please note that June 19 is the date to submit poster PDF file and Poster session will be hosted in June 21!
2024-5-31: Please note that 11th June to 14th June is the Final Project Checkpoint (make appointment with teaching staff and discuss progress)
2024-4-8: The midterm of this semester will be on 19th April, and a review session will be held by TAs later. Please kindly arrange your time and make preparations accordingly.
2024-4-7: The ddl for PA2 will be extended to 10th April， WA2 will be due on 13th April.
2024-2-25: The first class will be scheduled on Friday, March 1st, from 9:50am-12pm for this week.

Class info

Time: Friday 9:50am-12:15pm
Location: International Phase 1(深圳国际一期) A307

This introductory course gives an overview of many concepts, techniques, and algorithms in machine learning, from linear models such as logistic regression and SVM to more advanced topics such as deep neural networks and reinforcement learning. The course will give the student the basic ideas and intuition behind modern machine learning methods as well as a formal understanding of how, why, and when they work. The underlying theme in the course is statistical inference as it provides the foundation for most of the methods covered.

For more information about grading, homework and exam policies, see the class syllabus

Prerequisites: Undergraduate level calculus, probability and linear algebra. Basic Python programming.

Team

Yang Li
Instructor

Yanru Wu
Head TA

Boshi Tang
TA

Jiahao Lai
TA

Office hours

Name	Time	Location
Yang	Friday 2:00-4:00pm	Info Building 1108a
Yanru	Tuesday 4:00pm-6:00pm	Info Building, 11th floor common area
Boshi	Wednesday 2:00pm-4:00pm	Info Building 1701
Jiahao	Thursday 4:00pm-6:00pm	Info Building, 11th floor common area

You can also make appointments outside office hours.

Recitation & Review Sessions

Recitations will be held every Friday 9:00-9:45am in the lecture room.

Date	Topic	Reference
N/A	2022 Review Session： Basic linear algebra and probability; Scientific programming in Python. (programming demo) \| (math review) \| (math review notes) \| (recording)	The Maxtrix Cookbook by KB Petersen
3/8	WA0 Discussion (WA0 solution)	(Github Classroom Tutorial)
3/15	Probability review
3/22	GLM review and geometric interpretation of linear regression (recording)
3/29	weighted linear regression and KKT condition (recording)
4/7	WA1 homework discussion (recording)
4/14	Midterm review (recording)	review session notes
5/5	Programming Skills Discussion (recording)	skill.pdf notes
5/24	SVD Discussion (recording)

Class Schedule

The main reference reading material is the CS229 Machine Learning Lecture Notes (MLLN) by Andrew Ng and Tengyu Ma

Date	Topic	Readings & References	Homework Release
3/1	Introduction (slides with notes) \| (recording)		Written Assignment 0 (don't need to submit)
3/8	Supervised Learning I: Linear regression, logistic regression (slides with notes) \| (recording part1) \| (recording part2) \| (recording part3)	"Supervised learning" 1.1-1.3,2.1-2.3 (MLLN) Convex functions by Boyd & Vandenberghe (see Chapter 3)	Programming Assignment 1 due Mar 22
3/15	Supervised Learning II: Multi-class classificaiton, generalized linear models (slides with notes) \| (recording)	"Generalized linear models" (MLLN) Generalized Linear Models by Nelder & Wedderburn (1972)	Written Assignment 1 (due to Mar 29)
3/22	Supervised Learning III: Gaussian discriminative analysis (GDA), naive bayes (slides with notes) \| (recording)	"Generative learning algorithms" (MLLN) Reference: Event models for NB text classification by McCallum & Nigam (1998)	Programming Assignment 2 due April 9th
3/29	Supervised Learning IV: Support Vector Machines (SVM) (slides with notes) \| (recording)	"Support Vector Machines" (MLLN) References: Support Vector Networks by Cortes and Vapnik; SVM notes from “Selected Applications of Convex Optimization” by Li Li; Convex optimization by Stephen Boyd (See Chapter 5)
4/7	Kernel SVM (slides with notes part 1) \| (slides with notes part 2) \| (recording) Supervised Learning V: Neural Networks , Backpropagation	“Deep Feed Forword Networks” from Deep Learing by Ian Goodfellow	WA1 solution
4/12	Backpropagation and practical tips on neural networks Model selection (slides with notes) \| (recording)	“Regularization and model selection” (MLLN) “Regularization for Deep Learning” from Deep Learning Notes on matrix derivatives by Learned-Miller	Written Assignment 2
4/19	Midterm Exam		WA 2 solution
4/26	Basic Learning Theory (slides with notes) (recording)	"Generalization" (MLLN)	Final project information Recommended Latex template
5/10	Unsupervised Learning I: (slides with notes) (recording) K-means clustering spectral clustering	“Clustering and the k-means algorithm” (MLLN) A tutorial on spectral clustering by Ulrike von Luxburg
5/17	Unsupervised Learning II: (slides with notes) (recording) spectral clustering (continued), PCA	"Principal component analysis" (MLLN)
5/24	Unsupervised Learning III: ICA and CCA (slides with notes) (recording)	“Independent Component Analysis” (MLLN) “Canonical Correlation Analysis” by Hardle & Simar	Written Assignment 4 (due to June 7)
5/31	Reinforcement Learning (slides with notes) (recording)	"Reinforcement Learning" (MLLN) Playing Atari with Deep Reinforcement Learning by Mnih et. al. Extended reference text: Reinforcment Learning: An Introduction 2nd, ed. by Sutton and Barto
6/7	LLM Alignment
6/14	Transfer learning (recording) (handout 1) (handout 2)
6/21	Final poster presentation