| Date
|
Summary |
22/07/09
|
Overview of learning tasks
-
Basic course information and administrative details
-
Supervised and unsupervised learning
-
Learning task, instances, features, labels, reward/loss, training, testing
-
Overfitting and generalization: polynomial curve-fitting, segmenting coin toss sequences
-
Fold-based and leave-one-out cross validation
-
Classification with small discrete label sets
|
24/07/09
|
Probability distirbutions and statistical estimation (Review)
-
Review of gamma and beta functions; Bernoulli, binomial, Poisson, Gaussian distributions
-
Review of multinomial distribution and Dirichlet distribution
-
Review of immediately needed multivariate calculus
-
Statistical parameter estimation, MLE, m-estimates
-
Bayesian learning, priors and posteriors, MAP and other quantities of interest
-
Smoothing and conjugate priors; exponential family
|
|
24/07/09
|
Linear Regression with L2/L1 penalty (Chapter 3 of Bis07)
-
Linear least-squares fitting from a Bayesian perspective with Gaussian prior on model parameters
-
How the Bayesian approach leads to Ridge penalty
-
Solving least square and Ridge regression, the "hat matrix"
-
Alternative way to write Ridge regression as finding parameters within disk
-
Behavior of model weights as disk radius is changed (demo)
-
"Evidence-sharing" and geometric reasons for non-sparse solutions
-
Lasso regression: replacing L2 model penalty with L1 model penalty
-
Casting Lasso regression as a quadratic program
-
Behavior of model weights as L1 limit is changed (demo)
-
Implicit feature selection effect (sparser models) in Lasso
|
29/07/09
|
Classification overview
- Overview of classification methods: linear discriminative, probabilistic (conditional and generative), nearest neighbor.
-
Criteria to evaluate classifiers: accuracy, recall, precision, F1, AUC
|
29/07/09
07/08/09
|
Loss-regularization framework for classification (Chapter 4 of Bis07)
-
Loss functions: 0/1 ("true"), square, perceptron, logistic, hinge
- Perceptron algorithm with proof.
- Review of convex function and their optimization. ((Chapters 2.1, 2.2, 3.1, 4.2 of BV))
- Gradient-based training algorithms
- Scilab demo of optimizing for logistic loss.code
|
12/08/09 to 21/08/09
|
Probabilistic classifiers
-
Classification based on class-conditional density
-
Multivariate Gaussian (normal) distribution review, covariance
-
Discriminants between Gaussian class-conditional densities
-
(Fisher's) linear discriminant in the special case of equal covariance for all classes
-
The special case of spherical Gaussian densities
-
Discriminants are quadratic surfaces in case of diverse covariances
-
Play with scilab code, view discriminants and loss functions
|
26/08/09
|
Support vector machines (Chapter 7 of Bis07)
- Max margin motivation: low density, high stability
- Margin geometry to primal SVM formulation for separable training data (demo)
- Dual formulation and role of alpha in a form of sparse local regression
- Inseparable data, slack variables, hinge loss, upper bound on 0/1 training loss (demo)
- Dual needs only xi dot xj
- Replacing xi dot xj with a phi(xi) dot phi(xj) = k(xi,xj)
|
28/08/09
|
-
Handling non-linear regression by lifting data points to higher dimension (demo)
- Polynomial, Gaussian, RBF kernels (demo)
- We don't even need to know phi; phi can be infinite dimensional
- Gram matrix K = (Kij) needs to be positive semidefinite
- Rules for creating valid kernels by combining known kernels
- Support vector regression (Chapter 7 of Bis07)
- General feature functions phi(x,y) (note, number of y's is still small, no StructSVM yet)
- Multiclass SVMs
|
02/09/09, 04/09/09
|
|
04/09/2009-
09/09/2009
|
Decision tree classification,
|
11/09/2009, 23/09/2009
|
Ensemble and committee learners
|
|
Midsemester exam
|
25/09/2009, 30/09/2009
|
Clustering
|
07/10/09-09/10/09
|
Principal component analysis (PCA) (Chapter 12 from Bis07
- Basic PCA
-
Eigenvalue and eigenvector recap, demo
-
Probabilistic PCA
-
EM algorithm for PCA
|
14/10/2009 21/10/2009 23/10/2009
|
Non-linear Dimensionality Reduction
|
14/10/2009
|
-
Association rules and the apriori algorithm
-
FP tree algorithm in Han and Kamber book
-
Frequent temporal patterns, episodes
-
Surprising patterns
-
Spatial scan statistics
-
Rule learning overview; FOIL
|
24/10/2009
28/10/2009
|
Learning theory
Readings: Chapter 7 of Tom Mitchell's book on Machine learning, some useful online material: Andrew Moore's slides., John Shawe-Taylor's tutorial
- Basics of PAC learning
- Shatterings and VC dimensions
- Model selection via VC-dimensions
|
28/10/2009 04/11/2009
|
Active learning
|
06/11/2009 11/11/2009
|
Spatial scan statistics
|
11/11/2009 14/11/2009
|
Overview of graphical models
|