Office Hours:
Usual schedule:
Tuesdays: 3:30--4:30 PM, Wednesdays: 3:00--4:00 PM, Fridays:
3:30--4:30 PM. At other times by appointment.
Education and Affiliations
Research Interest
My topics of interest span several fields including databases,
data mining, machine learning and statistics. A
good idea about my research interests can be obtained by following my
publications. Some specific problems and
projects on which I have worked are listed below.
- World Wide Tables: The goal of this project is to answer table queries by tapping partially structured sources like tables and lists on the web.
- Information Extraction and data integration: Recently, I have been interested in graphical models
and their use for various extraction and integration
problems. As part of this effort, I have developed a package for
Conditional Random Fields (CRF) that can be downloaded from
sourceforge.
- ALIAS: This is a prototype of an interesting
and fairly compelling application of the use of machine learning
techniques like Active Learning to ease the duplicate elimination task
that arise in data cleaning.
- DATAMOLD: is a tool for Information Extraction (more like text
segmentation) using learning based on Hidden Markov Models. This
software has been licensed by a data cleaning consulting company to
solve real-life address cleaning tasks.
-
ICube: This is a project on which I worked actively between 1999-2001.
It is about enhanced mining of multidimensional OLAP products. A web demo of ICube is available.
-
New data mining operations: I have worked on temporal data mining. Currently
interested in various multi-class, multi-label and multi-taxonomy
learning problems.
-
Database mining integration: I have worked on two different aspects of
this problem. First on algorithmic and architectural issues related to
expressing association rule mining algorithm, in a relational engine.
Second, on deploying learnt models within a relational engine so as to
allow close integration with SQL querying and optimization.
- Some past projects (pre-1996): In the
past I have worked on various problems related to multidimensional
OLAP indexing and aggregation computation. My PhD thesis was on query
optimization and scheduling for tertiary memory databases.
- Ancient projects (pre-1991): I got my first glimpse to
research in computer science theory through search problems arising in
rectangle cutting and packing problems.
Selected professional activities
- VLDB 2011 Research track Co-chair
- VLDB member of the endowment board (2008--)
- ACM SIGKDD 2008 PC Co-chair
-
ACM SIGKDD, member of the Board of directors (2005-present)
-
SIGKDD Explorations, Editor-in-chief (2003-2005), Associate Editor (1999 - 2002)
-
ACM TODS, Editorial board(2004-2007)
-
ACM Transactions on KDD, Editorial board(2005-present)
-
Foundations and TrendsĀ® in Machine Learning Editorial board (2007-present)
-
IEEE Data Engineering Bulletin,
Associate Editor (2000 to 2001)
-
Program committe member
- ACM SIGKDD 2006 Workshop chair
- ACM SIGMOD 1998, 2002 (Demo committee), 2003, 2005, 2006
- VLDB 2000, 2002, 2004, 2007
- ACM SIGKDD 2001 (also in Best paper award committee), 2003, 2004, 2005, 2009 (Best paper committee), 2010 (Best paper committee)
- ICML: 2003
- IEEE ICDE 98, 2001, 2002, 2003, 2005, 2006 (demo)
- IEEE ICDM, Vice chair 2005
- EDBT 2006,2011
- COMAD 2000, 2005, 2008,2010
- WWW 2006
- CIDR 2009, 2010
- Senior PC, Vice chair etc
- ICML 2008
-
Knowledge discovery and data mining track, ICDE
2000
- ICDE 2008
- Sigmod 2009
- Others
- ICDE 2010 Tutorial chair
- WWW 2011 Tutorial chair
Teaching
- Advanced Machine learning, Spring 2010
-
- Foundations of Machine learning, Autumn 2009
-
CS 627: Graphical models and structured learning
Spring 2008
-
CS 636: Data mining
Fall 2007
-
IT655:Advanced data mining: Probabilistic graphical models
, Spring 2006, Spring 2007
-
IT608: Data warehousing and data mining, Spring 2000-03, 2005, Fall 2005, Fall 2006
-
IT655:Advanced data mining: Beyond record data mining:
Prediction with richer structures (sequences, trees, and graphs)
, Fall 2004
-
IT603: Data
Base Management Systems, Fall 1999, 2001
-
IT619: Graduate Software Lab, Autumn 2000
Patents
-
1 6,324,533 Integrated database and data-mining system
-
2 6,189,005 System and method for mining surprising temporal patterns
-
3 6,094,651 Discovery-driven exploration of OLAP data cubes
-
4 5,832,475 Database system and method employing data cube operator for group-by operations
Talks
- Structured learning. Tutorial at Machine Learning Winter School, Bangalore Jan 2010. Slides: part 1 and part 3 part 2
- Queries over unstructured data: probabilistic methods to the rescue. Keynote talk at BIRTE 2009 slides
- Structured prediction models in information extraction. Invited talk at the Data mining Forum Hongkong May 2008
-
The Role of Probabilistic Graphical Models in Databases. Tutorial at VLDB 2007. (with Amol Deshpande) Slides
-
Scalable information extraction and data integration. Tutorial at KDD 2006. (with Eugene Agichtein) Slides
-
Record linkage: Similarity measures and algorithms Tutorial at SIGMOD 2006 (with Nick Koudas and Divesh Srivastava). Slides
-
Graphical models for structure extraction and information integration. Keynote talk at ICDM 2005, Nov 2005. Slides
-
Models and indices for integrating unstructured data with a relational database. Keynote talk at KDID workshop, ECML/PKDD, September 2004.
-
Sequence data mining. Tutorial at KDD 2003 (with Mark Craven). Slides
-
Automation in Information extraction and data integration. Tutorial at
VLDB 2002. Slides
Current Students
Students graduated