Introduction to Information Management and Mining

Raghu Krishnapuram

IBM India Research Lab
Block I, Indian Institute of Technology
Hauz Khas, New Delhi 110016, INDIA, 


The World Wide Web and various intranets can be viewed as large unstructured or semi-structured databases. Extracting and collating specific types of information from such sources has become a formidable job due to the size of the sources and a lack of sufficient structure in them. In addition to the Web and intranets, there are also various other sources of information such as e-mail, databases, and files. Information management deals with extracting useful information about entities and relationships between entities out of such sources and organizing the information in a form that can be browsed, queried or searched. The solutions to problems in this area require the bringing together of ideas from many diverse disciplines such as machine learning, information retrieval and search, natural language processing, artificial intelligence and text analysis. This tutorial will cover the basics of information extraction, information retrieval, and text mining, and introduce algorithms for automatic taxonomy generation, knowledge organization (including document categorization and clustering), and search result organization.

