Date Summary

22/07/09

Overview of learning tasks

  • Basic course information and administrative details
  • Supervised and unsupervised learning
  • Learning task, instances, features, labels, reward/loss, training, testing
  • Overfitting and generalization: polynomial curve-fitting, segmenting coin toss sequences
  • Fold-based and leave-one-out cross validation
  • Classification with small discrete label sets
24/07/09

Probability distirbutions and statistical estimation (Review)

  • Review of gamma and beta functions; Bernoulli, binomial, Poisson, Gaussian distributions
  • Review of multinomial distribution and Dirichlet distribution
  • Review of immediately needed multivariate calculus
  • Statistical parameter estimation, MLE, m-estimates
  • Bayesian learning, priors and posteriors, MAP and other quantities of interest
  • Smoothing and conjugate priors; exponential family
24/07/09

Linear Regression with L2/L1 penalty (Chapter 3 of Bis07)

  • Linear least-squares fitting from a Bayesian perspective with Gaussian prior on model parameters
  • How the Bayesian approach leads to Ridge penalty
  • Solving least square and Ridge regression, the "hat matrix"
  • Alternative way to write Ridge regression as finding parameters within disk
  • Behavior of model weights as disk radius is changed (demo)
  • "Evidence-sharing" and geometric reasons for non-sparse solutions
  • Lasso regression: replacing L2 model penalty with L1 model penalty
  • Casting Lasso regression as a quadratic program
  • Behavior of model weights as L1 limit is changed (demo)
  • Implicit feature selection effect (sparser models) in Lasso

 


29/07/09

Classification overview 

  • Overview of classification methods: linear discriminative, probabilistic (conditional and generative), nearest neighbor.
  • Criteria to evaluate classifiers: accuracy, recall, precision, F1, AUC

29/07/09
07/08/09

Loss-regularization framework for classification (Chapter 4 of Bis07)

  • Loss functions: 0/1 ("true"), square, perceptron, logistic, hinge
  • Perceptron algorithm with proof.
  • Review of convex function and their optimization. ((Chapters 2.1, 2.2, 3.1, 4.2 of BV))
  • Gradient-based training algorithms
  • Scilab demo of optimizing for logistic loss.code

12/08/09 to
21/08/09

Probabilistic classifiers

  • Classification based on class-conditional density
  • Multivariate Gaussian (normal) distribution review, covariance
  • Discriminants between Gaussian class-conditional densities
  • (Fisher's) linear discriminant in the special case of equal covariance for all classes
  • The special case of spherical Gaussian densities
  • Discriminants are quadratic surfaces in case of diverse covariances
  • Play with scilab code, view discriminants and loss functions

26/08/09
Support vector machines (Chapter 7 of Bis07)
  • Max margin motivation: low density, high stability
  • Margin geometry to primal SVM formulation for separable training data (demo)
  • Dual formulation and role of alpha in a form of sparse local regression
  • Inseparable data, slack variables, hinge loss, upper bound on 0/1 training loss (demo)
  • Dual needs only xi dot xj
  • Replacing xi dot xj with a phi(xi) dot phi(xj) = k(xi,xj)

28/08/09
  • Handling non-linear regression by lifting data points to higher dimension (demo)
  • Polynomial, Gaussian, RBF kernels (demo)
  • We don't even need to know phi; phi can be infinite dimensional
  • Gram matrix K = (Kij) needs to be positive semidefinite
  • Rules for creating valid kernels by combining known kernels
  • Support vector regression (Chapter 7 of Bis07)
  • General feature functions phi(x,y) (note, number of y's is still small, no StructSVM yet)
  • Multiclass SVMs

02/09/09, 04/09/09

04/09/2009- 09/09/2009

Decision tree classification,

  • Purity, Gini index, entropy
  • Scalable decision tree implementation: SPRINT
  • Decision stubs
  • Regression tree

11/09/2009, 23/09/2009
Ensemble and committee learners

Midsemester exam


25/09/2009, 30/09/2009

Clustering


07/10/09-09/10/09
Principal component analysis (PCA) (Chapter 12 from Bis07
  • Basic PCA
  • Eigenvalue and eigenvector recap, demo
  • Probabilistic PCA
  • EM algorithm for PCA

14/10/2009
21/10/2009
23/10/2009
Non-linear Dimensionality Reduction

14/10/2009
  • Association rules and the apriori algorithm
  • FP tree algorithm in Han and Kamber book
  • Frequent temporal patterns, episodes
  • Surprising patterns
  • Spatial scan statistics
  • Rule learning overview; FOIL

24/10/2009
28/10/2009
Learning theory
Readings: Chapter 7 of Tom Mitchell's book on Machine learning, some useful online material: Andrew Moore's slides., John Shawe-Taylor's tutorial
  • Basics of PAC learning
  • Shatterings and VC dimensions
  • Model selection via VC-dimensions

28/10/2009
04/11/2009
Active learning

06/11/2009
11/11/2009
Spatial scan statistics

11/11/2009
14/11/2009
Overview of graphical models