The objective of Morpheme Extraction Task (MET) is to design algorithms/methods that discover morphemes (the smallest meaningful units of language) in Indian Languages (Bengali, Gujarati, Hindi, Marathi, Odia, Telugu, Tamil and Assamese). The output of such systems are considered as one of the most useful sub-components in machine translation, and different information retrieval tasks.

Discovering morphemes from a given language can be done by:

  • Morphological Analyser
  • Lemmatizer
  • Stemmer


Subtask 1

Evaluation based on Gold standards (G): Test data will comprise of 30,000 surface words in each language. The results will be evaluated manually. Morphological Analysers (a) and Lemmatizers (b), will be evaluated in this subtask.

Subtask 2

Information Retrieval (IR) based: IR experiments will be performed, where the words in the documents and queries will be replaced by their proposed morpheme representations. The search will then be based on morphemes instead of words. Terrier search engine will be used (retrieval model will be announced later). The stop-word list size will be same for each participant. Morphological Analysers (a), Lemmatizers (b) and Stemmers(c) will be evaluated in this subtask. will be evaluated in this subtask.


Subtask 1: F1
Subtask 2: MAP

Submission Format

Participants are required to upload the Morphology Extraction tool [(a), (b), (c)] they developed. The link will be circulated later. The Tool should have the following functionalities:

  1. The tool should be named as "Team_ID_FIRE-MDT-[Language]-2012". The [Language] part is optional, in case it is a language independent tool.
  2. A read me File should mention all the parameters clearly and the category (a. Morphological Analyser, b. Lemmatizer and c. Stemmers).
  3. One output option should able to generate a tab seperated two column file, given a list of surface words. example:
    cats "\t" cat

Task Coordinators

Rashmi Sankepally, DA-IICT, Gandhinagar, Gujarat
Somnath Chandra, DIT, Govt. Of. India. New Delhi