Homework for IT642, Spring 2002
Homework 1: (due Jan 31)
Run the following five classifiers on five datasets of your choice.
Report a table of best accuracy for each classification method and
dataset. For each dataset draw a graph showing the effect of accuracy
on the following parameter for each classification method.
- Decision trees (C4.5): Effect of pruning
- Naive bayes: Effect of feature selection
- K-nearest neighbor: Effect of K
- Support Vector Machines: Effect of different kernel
- Neural networks: Effect of number of nodes.
The five dataset may be chosen from the UCI repository (TAs will show
you how to acquire these). One of the five datasets should be from
the KDD repository instead of the machine learning repository.
Homework 2: (due Feb 27)
Implement an efficient and scalable algorithm for constructing
decision tree classifiers.