Task Desciption


Morphology is the study of words and word forms. Morphological Analysis aims to understand the structure of a language and hence assumes utmost importance in various IR, Machine Learning and NLP tasks. The following two evaluation frame works are in place to test participating Morpheme Extraction Systems this year in Fire 2013:

IR based Evaluation

The morpheme analyses proposed by the systems submitted will be used in Terrier indexing and retrieval experiments. The retrieved results will be evaluated against the available relevance judgments. The Mean Average Precision thus obtained will be treated as the final score for the system. A subjective query-wise analysis of the evaluation results will also be done. This evaluation is available in Bengali, Gujarati, Hindi, Marathi and Odia languages.



Language dependent Evaluation

A sample of the proposed morpheme analyses of the systems will be compared against a sample of the gold standard data (which contains manual morpheme analyses). This experiment will be repeated over several samples and the average is treated as the final score. This task is available at present in Bengali and Tamil languages. The gold standard data is available for 3000 words. Please refer http://research.ics.aalto.fi/events/morphochallenge2010/evaluation.shtml for detailed description of a similar evaluation framework.



Task


Participants are expected to develop systems which take a huge list of words as input file and produce an output file which contains all the words along with their morpheme analyses. The test data will contain a list of all unique words from the FIRE corpora for each language and can be used as input file while testing the systems. The output file should adhere to the following set of guidelines:

  • Print the first word. Print Tab.
  • Print the first morpheme of the word. Print space.
  • Print the grammatical part of the first morpheme with a '+' sign before it. There are no standard set of rules for specifying the grammatical part. The actual terms will not be compared. Instead, pairs of terms having the same grammatical structure will be compared in the Language dependent evaluation part. So, you can decide your own set of rules for specifying the grammatical structure. Print space.
  • Print the second morpheme. Print space.
  • Print the grammatical part of the second morpheme with a '+' sign before it. Print space.
  • Continue similarly for all morphemes that make up the word. After the first word is done, print next word in the next line and continue similarly for all words in the input file.

IMPORATANT NOTE: The grammar part is optional and not necessary if you intend to take part in the IR task only. Only morphemes will be used for the IR evaluation and all terms beginning with '+' will be removed for this.

Example output format:
word  [tab]   morpheme1  [space]  +grammatical structure  [space]  morpheme2  [space]  +grammatical structure
titiliya     titili   +PL
(titili means butterfly in Hindi, its plural form, titiliya can be shown as +PL )

Submission Format


Participants are required to submit a zip folder named as "InstituteName_FIRE-MET-[Language]-2013". The [Language] part is optional, in case it is a language independent tool. The folder should contain the system (an executable code or software) and a read me file containing the following:

  • Full name of the institute or research group
  • Full names and email ids of Team members
  • A step-by-step procedure required to run the system
  • Any other parameters required for running the system

  • The systems can be submitted here. Please note the maximum allowed size is 1Gb.


Task Coordinators



Rashmi Sankepally, University of Maryland - College Park
rashmi.sankepally@gmail.com

Komal Agarwal, DA-IICT, Gandhinagar, Gujarat
komalagarwal07@gmail.com