Tuesdays and Fridays: 5:00 -- 6:00pm. Thursdays: 4--6 pm.
At other times by appointment.
Education and Affiliations
My topics of interest span several fields including databases, data
mining, machine learning and statistics. My current research
interests are web information extraction, data integration, graphical
models and structured learning.
A good idea about my research interests can be obtained by following
my publications. Some specific problems and
projects on which I have worked are listed below.
- World Wide Tables: The goal of this project is to answer table queries by tapping partially structured sources like tables and lists on the web.
- Information Extraction and data integration: Recently, I have been interested in graphical models
and their use for various extraction and integration
problems. As part of this effort, I have developed a package for
Conditional Random Fields (CRF) that can be downloaded from
- ALIAS: This is a prototype of an interesting
and fairly compelling application of the use of machine learning
techniques like Active Learning to ease the duplicate elimination task
that arise in data cleaning.
- DATAMOLD: is a tool for Information Extraction (more like text
segmentation) using learning based on Hidden Markov Models. This
software has been licensed by a data cleaning consulting company to
solve real-life address cleaning tasks.
ICube: This is a project on which I worked actively between 1999-2001.
It is about enhanced mining of multidimensional OLAP products. A web demo of ICube is available.
New data mining operations: I have worked on temporal data mining. Currently
interested in various multi-class, multi-label and multi-taxonomy
Database mining integration: I have worked on two different aspects of
this problem. First on algorithmic and architectural issues related to
expressing association rule mining algorithm, in a relational engine.
Second, on deploying learnt models within a relational engine so as to
allow close integration with SQL querying and optimization.
- Some past projects (pre-1996): In the
past I have worked on various problems related to multidimensional
OLAP indexing and aggregation computation. My PhD thesis was on query
optimization and scheduling for tertiary memory databases.
- Ancient projects (pre-1991): I got my first glimpse to
research in computer science theory through search problems arising in
rectangle cutting and packing problems.
Selected professional activities
- VLDB 2011 Research track Co-chair
- VLDB member of the endowment board (2008--)
- ACM SIGKDD 2008 PC Co-chair
ACM SIGKDD, member of the Board of directors (2005-present)
SIGKDD Explorations, Editor-in-chief (2003-2005), Associate Editor (1999 - 2002)
ACM TODS, Editorial board(2004-2007)
ACM Transactions on KDD, Editorial board(2005-present)
Foundations and Trends® in Machine Learning Editorial board (2007-present)
IEEE Data Engineering Bulletin,
Associate Editor (2000 to 2001)
Program committe member
- ACM SIGKDD 2006 Workshop chair
- ACM SIGMOD 1998, 2002 (Demo committee), 2003, 2005, 2006
- VLDB 2000, 2002, 2004, 2007
- ACM SIGKDD 2001 (also in Best paper award committee), 2003, 2004, 2005, 2009 (Best paper committee), 2010 (Best paper committee)
- ICML: 2003, 2011, 2013
- IEEE ICDE 98, 2001, 2002, 2003, 2005, 2006 (demo)
- IEEE ICDM, Vice chair 2005
- EDBT 2006,2011
- COMAD 2000, 2005, 2008,2010
- WWW 2006, 2013
- CIDR 2009, 2010
- WSDM 2013
- Senior PC, Vice chair etc
- ICML 2008
- KDD 2011, KDD 2012, KDD 2013
- NIPS 2011, 2012
Knowledge discovery and data mining track, ICDE
- ICDE 2008
- Sigmod 2009
- ICDE 2010 Tutorial chair
- WWW 2011 Tutorial chair
- Introduction to Machine Learning, Autumn 2011
- Advanced Machine learning, Spring 2010, Spring 2011, Spring 2012
- Foundations of Machine learning, Autumn 2009, Autumn 2012
CS 627: Graphical models and structured learning
CS 636: Data mining
IT655:Advanced data mining: Probabilistic graphical models
, Spring 2006, Spring 2007
IT608: Data warehousing and data mining, Spring 2000-03, 2005, Fall 2005, Fall 2006
IT655:Advanced data mining: Beyond record data mining:
Prediction with richer structures (sequences, trees, and graphs)
, Fall 2004
Base Management Systems, Fall 1999, 2001
IT619: Graduate Software Lab, Autumn 2000
1 6,324,533 Integrated database and data-mining system
2 6,189,005 System and method for mining surprising temporal patterns
3 6,094,651 Discovery-driven exploration of OLAP data cubes
4 5,832,475 Database system and method employing data cube operator for group-by operations
- Statistical Machine Learning for Complex Predictions in Large-scale Scenarios, Invited speaker at the International Colloquium on Perspectives in Fundamental Research, Homi Bhabha Birth Centenary Event. March 2010.
- Structured learning. Tutorial at Machine Learning Winter School, Bangalore Jan 2010. Slides: part 1 and part 3 part 2
- Queries over unstructured data: probabilistic methods to the rescue. Keynote talk at BIRTE 2009 slides
- Structured prediction models in information extraction. Invited talk at the Data mining Forum Hongkong May 2008
The Role of Probabilistic Graphical Models in Databases. Tutorial at VLDB 2007. (with Amol Deshpande) Slides
Scalable information extraction and data integration. Tutorial at KDD 2006. (with Eugene Agichtein) Slides
Record linkage: Similarity measures and algorithms Tutorial at SIGMOD 2006 (with Nick Koudas and Divesh Srivastava). Slides
Graphical models for structure extraction and information integration. Keynote talk at ICDM 2005, Nov 2005. Slides
Models and indices for integrating unstructured data with a relational database. Keynote talk at KDID workshop, ECML/PKDD, September 2004.
Sequence data mining. Tutorial at KDD 2003 (with Mark Craven). Slides
Automation in Information extraction and data integration. Tutorial at
VLDB 2002. Slides
- Arun Iyer PhD.
- Gaurish Chaudhuri, MTech 2013
- Vashist Avadhanula, MTech 2013
- Abhirut Gupta, MTech 2014
- Vivek Sembium, MTech 2014