Exploiting Local Regularities in Text Segmentation using Conditional Random Fields

Utkarsh Jain


Many interesting real-life applications in machine learning rely on various statistical models. These models typically involve learning regularities/patterns from the training data, and applying them to the test data. Such global regularities are expected to be distributed independently and identically across all data. Future data, however, can contain additional regularities which are limited to a certain subset of data or depend on the source of data.
In this project, we are working on designing a model based on Conditional Random Fields that can model and extract local regularities from data having additional grouping in form of locales. Using the model, we look forward to obtain an improved solution to the problem of automatically segmenting text records into structured elements.

