My topics of interest span several fields including databases, data
mining, machine learning and statistics. My current research
interests are web information extraction, data integration, graphical
models and structured learning.
A good idea about my research interests can be obtained by following
my publications. Some specific problems and
projects on which I have worked are listed below.
World Wide Tables: The goal of this project is to answer table queries by tapping partially structured sources like tables and lists on the web.
Information Extraction and data integration: Recently, I have been interested in graphical models
and their use for various extraction and integration
problems. As part of this effort, I have developed a package for
Conditional Random Fields (CRF) that can be downloaded from
sourceforge.
ALIAS: This is a prototype of an interesting
and fairly compelling application of the use of machine learning
techniques like Active Learning to ease the duplicate elimination task
that arise in data cleaning.
DATAMOLD: is a tool for Information Extraction (more like text
segmentation) using learning based on Hidden Markov Models. This
software has been licensed by a data cleaning consulting company to
solve real-life address cleaning tasks.
ICube: This is a project on which I worked actively between 1999-2001.
It is about enhanced mining of multidimensional OLAP products. A web demo of ICube is available.
New data mining operations: I have worked on temporal data mining. Currently
interested in various multi-class, multi-label and multi-taxonomy
learning problems.
Database mining integration: I have worked on two different aspects of
this problem. First on algorithmic and architectural issues related to
expressing association rule mining algorithm, in a relational engine.
Second, on deploying learnt models within a relational engine so as to
allow close integration with SQL querying and optimization.
Some past projects (pre-1996): In the
past I have worked on various problems related to multidimensional
OLAP indexing and aggregation computation. My PhD thesis was on query
optimization and scheduling for tertiary memory databases.
Ancient projects (pre-1991): I got my first glimpse to
research in computer science theory through search problems arising in
rectangle cutting and packing problems.