Login IRSI E-mail Account

Information access in the legal domain

Task Definition

Summarization of legal documents

Legal documents by themselves always use a typical vocabulary and many long and complex sentences which make them difficult to understand by layman. A system that would summarise these documents and make them easily readable to a common man, explaining the essence without dealing with the nitigrities, would be of immense importance. Such a system can be very useful in providing priliminary legal assistance to a common person, who otherwise cannot understand the legal terminology that is usually used in such documents. This system can be of equal assistance to a lawyer or a paralegal and can save a lot of his/her time by providing precisely the information that maybe of importance to them. This year we will focus only on single source summarization and generate a separate summary for each individual case, irrespective of whether or not it is similar to another case. Both abstractive and extractive summarization would be considered valid.


The corpus consists of ~1500 judgements from the Supreme court of India. Each judgement has a corresponding headnote that is written by professionals with legal expertise. These headnote summarise the entire case starting from the original dispute to the final judgement, including the applicable penal codes, ruling by lower courts, etc. These judgements are from the period 1950-1989 and hence it covers a considerable amount of variation in language and vocabulary. The training set will consist of 1000 judgements spread over the entire period of 4 decades. The test set will contain remaining 500 documents again from the same time period.

Training set documents are in TREC format with <DOC> and <DOCID> tags used as usual. Other tags include
  • <HEADNOTE>  Denotes the Headnote (Summary of the Judgement)
  • <JUDGEMENT> Denotes the actual Judgement
  • <INFO>	    Denotes the extra data that can be used for summarisation if required. 
    	    This consists of the ACTS and the CITATIONS to other cases, used in the 
    	    Judgement. It should be noted that this part can be used if required 
    	    to support your technique, but is not supposed to be summarised. Only 
    	    the text inside the JUDGEMENT tag is to be summarised.
The naming convention followed for training and test documents is <SupremeCourt>_<year>_<Case id>.txt, where case id is the serial number of the case in that year. In a general scenario the case id can be used to find out related judgements which were used as precedents or which were referred to in this case. But since all the documents are not made avialable in this years corpus, case id tag will not be of much use except providing a unique number to cases of the same year. We plan to release the entire corpus next year when these case ids will be useful and hence we retain them in this years corpus as well.

The Training corpus has been released and can be downloaded here.

The Test corpus has been released and can be downloaded here.

For passcodes for the corpora please contact the task co-ordinators, or check the LIA group.

Submission Format

The test documents will have the naming convention <supremecourt>_<year>_<id>;.txt, and have the <DOC>, <DOCID>, <INFO> and <JUDGEMENT> tags.

For each docuement in the test set the participants need to create a document named <supremecourt>_<year>_<id>_<summary>.txt. The summary document should be on the TREC format, with <DOC>, <DOCID> and <HEADNOTE> tags.

A team can submit multiple runs. Each run must be submitted in the form of a folder named as <Teamname>_<runid>_<description>, each folder containg the summaries(HEADNOTES) in the format as specified above. The <description> tag is optional and can be used to indicate the technique used by the team for that particular run.

There is no upper limit on the number of runs a team can submit, though it is recommendable that they be reasonable in number. These runs should be added to a single .tar file and uploaded on the link provided below.

You can submit your runs here.

Evaluation Criteria

Evaluation will be based on the line of Rouge (Recall-Oriented Understudy for Gisting Evaluation) measure, taking into consideration how well a system-generated summary covers the content present in the human generated head notes.

Exact evaluation details will be updated soon.

Important Dates

  • Training data release: 20 September (Released)
  • Run submission: 30 October
  • Evaluation Results: 10 November
  • Working Notes Due: 20 November

Co-ordinating Committee

  • Madhulika Agrawal, DA-IICT Gandhinagar
  • Parth Mehta, DA-IICT Gandhinagar
  • Mandar Mitra, ISI Kolkata
  • Prasenjit Majumder, DA-IICT Gandhinagar

Contact Us

For the latest updates please join the google mailing group Information Access in Legal Domain
After joing the Group the participants can simply mail their doubts to legaltrack@googlegroups.com

In case of any problem with the mailing list you can contact:

Madhulika Agrawal, DA-IICT

Parth Mehta, DA-IICT


Working Notes
Call for Demo
Call for Short Papers


FIRE 2013
FIRE 2012
FIRE 2011
FIRE 2010
FIRE 2008


Copyright © 2014 IRSI All rights reserved.

This site was last updated 2014-11-23 00:00:00.000000000 +0530