The lingual diversity of the Indian sub-continent is similar to that found in Europe. Geographically, the Indian subcontinent consists of six countries, namely Pakistan, Bangladesh, Nepal, Sri Lanka, Bhutan and India. The total population in this part of the world is about 1,300 million and about 25 official languages are used by this population. Among the major languages of this region, Hindi and Bengali rank among the top ten most-spoken languages of the world. Over the past few years (2000--2007), a large volume of Indian language (IL) electronic documents has come into existence at a growth rate of 700.0%. The need for developing IR systems to deal with this growing repository is unquestionable.
The importance of reusable, large-scale standard test collections in Information Access research has been widely recognized. The success of TREC, CLEF, and NTCIR has clearly established the importance of an evaluation workshop that facilitates research by providing the data and a common forum for comparing models and techniques.
The Forum for Information Retrieval Evaluation (FIRE) has the following aims:
- to encourage research in Indian language Information Access technologies by providing reusable large-scale test collections for ILIR experiments
- to provide a common evaluation infrastructure for comparing the performance of different IR system
- to investigate evaluation methods for Information Access techniques and methods for constructing a reusable large-scale data set for ILIR experiments.
- Ad-hoc monolingual document retrieval in Indian
languages viz. Hindi, Bangla, Marathi, Tamil, Telugu,
Punjabi and Malayalam.
- Ad-hoc cross-lingual document retrieval
- from Hindi, Bangla, Marathi, Tamil, Telugu, Punjabi, Malayalam to
English and Hindi;
- from English to any of the Indian languages
(Hindi, Bangla, Marathi, Tamil, Telugu, Punjabi, Malayalam).