| Review of record-oriented mining techniques:
Introduction,
Record mining operators
|
27 Jul |
|
Analyzing sequence data ..
Slides
|
3 Aug |
| Information Extraction | |
| HMMs for text segmentation | 10 Aug |
|
Automatic text segmentation for extracting structured records. Borkar, Deshmukh, and Sarawagi Slides
|
| Maximum entropy taggers | 12 Aug |
| Exploiting Diverse Knowledge Sources via Maximum Entropy in Named Entity Recognition, Borthwick et al | |
| A Maximum Entropy Model for Part-Of-Speech Tagging by Adwait Ratnaparkhi | |
|
Use of Support Vector Machines in Extended Named Entity Recognition, Takeuchi and Collier | 17 Aug |
| Instructor out of town | 19, 24, 26 Aug |
| Global discriminating models |
| Shallow parsing with conditional random fields.Sha and Pereira. |
31 Aug |
|
Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms, Collins.
|
3 Sept |
| Submission of project proposals | 6 Sept |
|
| Classifying linked objects | |
|
Enhanced Hypertext Categorization Using Hyperlinks,Chakrabarti, Dom, Indyk. SIGMOD 1998.
|
3 Sept |
|
Link-based Classification, Q. Lu and L. Getoor. ICML2003.
|
Sep 7/8 |
|
Discriminative Probabilistic Models for Relational Data, B. Taskar et al |
| Sep 7/8 |
|
Graphical models, Jordan
|
Sep 10 |
|
| Relational data mining | |
|
Learning Probabilistic Relational Models
Lise Getoor, Nir Friedman, Daphne Koller and Avi Pfeffer
|
Sep 22 |
|
Neville, J., M. Rattigan and D. Jensen (2003). Statistical relational learning: Four claims and a survey.
|
Sep 24 |
| (optional readings)
Individuals, relations and structures in probabilistic models James Cussens
|
| Final project presentations and reports due | 15 Oct |
Student are expected to read the papers in advance and submit an
independent one-page summary of the paper(s) before the class starts (30%)
There will be a single 2 hour exam based on the papers (30%)
Groups of two or three students need to do a final project (40%)
Audit students need to accrue 20% marks through any of the above three
mechanism. The preferred way is to write decent summaries of 6-7
papers.