## Information access in the legal domain

1. Adhoc retrieval from legal documents
• Documents: verdicts from the Supreme Court, various acts of parliament.
• Topics: descriptions of situations in which legal assistance is required. The implicit query in all cases is "What legal documents are relevant to this situation?"
This year we will be focusing on two domains
(a) Consumer Law
(b) Hindu Marriage & Divorce Law
• Retrieval granularity: Participants will have to retrieve the most relevant Documents.
2. Identification and Classification of Propositions in Court Judgments
• set - ~800 judgments from the Indian Supreme Court
• Participants are required to:
1. Parse each judgment into individual propositions
2. Identify the nature and character of the proposition according to the typology provided here

An ideal Legal Information Access System divided into two subtasks

Ideally the process mentioned in the above figure should be followed, but to provide flexibility and allow a team to focus on either of the steps depending on their interest the organisers have divided the task into two distinct subtasks. Participants can submit runs for either of the individual subtasks or both. Both the subtasks will be evaluated seperately. Subtask (a) consists of segmenting a document into sentences. Each sentence should be further segmented into propositions (if the sentence has multiple propositions). Participants for subtask (a) will be provided with ~800 documents consisting of Supreme court judgments from the year 1981 to 1990. Subtask (b) consists of classifying the propositions from a segmented document. The input in this case will be a document with individual propositions on a seperate line. The participants have to assign a unique category to each of the propositions. Participants of both the subtasks will be provided a training set of ~20 documents which have been parsed and annotated by legal experts. Evaluation criterion that will be followed can be found here.

## Submission Format (Only for task 1)

The output format is as follows:
• Make a file and name it as <team_name>_<Identifier>_<Model_Used>
• The entries in the file are as < Query_Number > < Q0 > < Document_Name > < sequence_number > < Similarity Score > < Identifier >
You are supposed to submit 1000 relevant documents for each query. Most relevant document will have sequence number 0 and 1000th document will have sequence number 999.

Here identifier means the "Query - Corpus" combination you are using. We have two query sets. Consumer queries(C) and Hindu marriage queries (H). The data sets are Consumer court judgments (C), Hindu marriage judgments (H) and the overall corpus (O) that consist of both of these (C) & (H). So possible identifiers are CC,CO, HH and HO, where
• CC means consumer queries(C) run over consumer court data set(C).
• CO means consumer queries(C) run over overall data set(O).
• HH means Hindu marriage queries(H) run over Hindu marriage data set(H).
• HO means Hindu marriage queries run over overall data set(O).
One file should contain only one identifier. You can submit as many runs as you wish. Submit different documents for consumer and Hindu marriage queries.

PS: Here SC-HC judgments are referred as Hindu marriage judgments.

You can submit your runs here.

## Classification Scheme for Task 2(b)

Scheme for Classification of Propositions

 Scheme for Classification of Propositions No. Category Code 1 Fact 1(a) Intrinsic to the case FI 1(b) Extrinsic to the case FE 2 Issue I 3 Argument A 4 Ruling by lower court LR 5 General standard of conduct 5(a) Statute SS 5(b) Precedent SP 5(c ) Other general standards, including customary, equitable and other extra-legal considerations SO 6 Ruling by the present court R

## Evaluation Criteria

• For Task-1 the evaluation of submitted runs will be done based on Mean Average Precision of the systems, as done in standard Ad-hoc Retrieval task
• For Task-2(b) The evaluation will be done by first calculating Ai as below
$A_{i} = \sum_{j} \frac{\{P_{ij} \cap Q_{ij}\}}{\{P_{ij} \cup Q_{ij}\}}$
$P_{ij}$ : is the proposition number j of sentence number i in the original text.
$Q_{ij}$ : is the proposition number j of sentence number i in the participants text.
The operations $\cup$ and $\cap$ denote intersection (finding the terms common to the two propositions) and union (finding the terms belonging to either of the two propositions. This measure will be calculated for each sentence in all the documents and the overall accuracies will be calculated using the micro average of these values
$A_{overall} = \frac{\sum_{k=1}^{N}\sum_{i=1}^{M}A_{ki}}{NM}$
where M is the total number of sentence in the document number k, and N is the total number of documents in the test collection.
• For Task-2(b) evaluation criteria will be f-measure. The categories will be assigned different weights depending on the proportion in which they are present.

## Important Dates

• Test data release: 20th September
• Run submission: 20th October
• Rel judgment release: 15 November
• Working Noted Due: 20 November

## Co-ordinating Committee

• Mandar Mitra, ISI Kolkata
• Prasenjit Majumder, DA-IICT Gandhinagar
• Kripabandhu Ghosh, ISI Kolkata
• Parth Mehta, DA-IICT Gandhinagar
• Abhik Majumdar, NLUO Odisha

After joing the Group the participants can simply mail their doubts to legaltrack@googlegroups.com

In case of any problem with the mailing list you can contact:

Parth Mehta, DA-IICT
parth.mehta126@gmail.com

Kripabandhu Ghosh, ISI Kolkata
kripa.ghosh@gmail.com

 • Home • Home 2013 • Schedule • Working notes Presentations • Abstracts • Data • Committee • Resources • Venue fire-list • FAQ

## Archives

 • FIRE 2012 • FIRE 2011 • FIRE 2010 • FIRE 2008