Thanks for your interest in the Information Retrieval from Microblogs during Disasters (IRMiDis) track of FIRE 2017.


==== DESCRIPTION OF THE TRACK ====

The FIRE 2017 Information Retrieval from Microblogs during Disasters (IRMiDis) track focused on retrieval and matching of needs and availabilities of resources from microblogs posted on Twitter during a disaster event - the Nepal earthquake in April 2015.

Task-I: The first task was to retrieve tweets that inform about needs and availabilities of resources; these tweets are called need-tweets and availability-tweets. 

Task-II: The second task was to match need-tweets with appropriate availability-tweets. 

At the start of the track, 20,000 chronologically earlier tweets related to the event were released. Also given was a sample of need-tweets and availability-tweets in these 20K tweets (training set). The participating teams used the training set to formulate their methodologies. 

Later, 46K chronologically later tweets were released (test set). The methodologies were evaluated based on their performance over the test set.


==== CONTENTS OF THIS FOLDER ====

This folder contains, apart from this README.txt file, the following sub-folders for the two aforesaid tasks:
Task-I-Need-Availability-Retrieval and Task-II-Need-Availability-Matching

(*) Task-I-Need-Availability-Retrieval sub-folder contains two sub-folders namely training-set-directory and test-set-directory

(*) training-set-directory contains three files: 
(1) NepalQuake-training-20K-tweetids.txt - contains 20K tweetids, i.e., identifiers of tweets / microblogs posted in Twitter during the Nepal earthquake in April 2015 (chronologically earlier tweets). 
(2) NepalQuake-training-availability-tweetids.txt - the tweetids of some of the need-tweets among the 20K tweets in the above file. 
(3) NepalQuake-training-need-tweetids.txt - the tweetids of the  availability-tweets among the 20K tweets in the above file.


(*) test-set-directory contains three files: 
(1) NepalQuake-test-46K-tweetids.txt - contains 46K tweetids, i.e., identifiers of tweets / microblogs posted in Twitter during the Nepal earthquake in April 2015 (chronologically later tweets)
(2) NepalQuake-test-need-tweetids.txt - the tweetids of the need-tweets among the 46K tweets in the 46K tweets in the test set. 
(3) NepalQuake-test-availability-tweetids.txt - the tweetids of the availability-tweets among the 46K tweets in the test set.

All the above files contain one tweet-id per line. 


(*) Task-II-Need-Availability-Matching sub-folder contains two files: 
(1) NepalQuake-Matching-tweetids-training-set.txt: contains some examples of the correct matchings of need-tweets and availability-tweets corresponding to training set (chronologically earlier 20K tweets)
(2) NepalQuake-Matching-tweetids-test-set.txt contains the correct matchings of need-tweets and availability-tweets corresponding to test set (chronologically later 46K tweets).

For the above two files, format of each line is 
<Need-tweet-id>:<Availability-tweet-id1>, <Availability-tweet-id2>,…,<Availability-tweet-idN>
where the availability-tweets mentioned on a line are all correct matchings for the need-tweet whose id is mentioned at the beginning of the same line. 


=== PAPER TO CITE === 

Further details about the track can be obtained from the following paper, which must be cited if you use this dataset:

Moumita Basu, Saptarshi Ghosh, Kripabandhu Ghosh, Monojit Choudhury. Overview of the FIRE 2017 track: Information Retrieval from Microblogs during Disasters (IRMiDis). Working notes of FIRE 2017 - Annual Meeting of the Forum for Information Retrieval Evaluation, Bangalore, India, December 2017, CEUR workshop proceedings, Volume 2036, pp. 28-33. 


=== REQUEST TO CONTRIBUTE TOWARDS THE DATA COLLECTION ===

We employed human annotators to identify the relevant tweets / matchings. Additionally, we pooled the top results submitted by the teams who participated in the FIRE 2017 IRMiDis data challenge, and checked for relevant tweets. In spite of our efforts, it is possible that some relevant tweets / matchings could not be identified. 

If you find such relevant tweets or matching pairs, that are not included in the gold standard, please inform us. We will verify the tweets you indicate, and include them in the gold standard if found suitable. We will accordingly acknowledge your contribution in the dataset.


