Finding Pages on the Unarchived Web
Jaap Kamps, University of Amsterdam, The Netherlands.

Web archives preserve the fast changing Web, yet are highly incomplete due to crawling restrictions, crawling depth and frequency, or restrictive selection policies—most of the Web is unarchived and therefore lost to posterity. We propose an approach to recover significant parts of the unarchived Web, by reconstructing descriptions of these pages based on links and anchors in the set of crawled pages, and experiment with this approach on the Dutch Web archive. Our main findings are threefold. First, the crawled Web contains evidence of a remarkable number of unarchived pages and websites, potentially dramatically increasing the coverage of the Web archive. Second, the link and anchor descriptions have a highly skewed distribution: popular pages such as home pages have more terms, but the richness tapers off quickly. Third, the succinct representation is generally rich enough to uniquely identify pages on the unarchived Web: in a known-item search setting we can retrieve these pages within the first ranks on average.

Medical Tasks as part of the CLEF campaign
Henning Müller, University Hospitals and University of Geneva, Switzerland.

Both ImageCLEF and CLEFeHealth have been run as part of the Cross-Language Evaluation Forum for several years and highlight the importance of medical information retrieval and multilingual access to it. Medical data sets have some constraints that are different from other data, particularly if patient data are concerned. Also the tasks need to be developed in collaboration with health professionals to be useful. The presentation will show an overview of the tasks run in the past and some of the results, including the specific parts of medical data.

Examining the Limits of Crowdsourcing for Relevance Assessment
Paul Clough, The University of Sheffield, UK.

Evaluation is instrumental in the development and management of effective information retrieval systems and ensuring high levels of user satisfaction. Using crowdsourcing as part of this process has been shown to be viable. What is less well understood are the limits of crowdsourcing for evaluation, particularly for domain specific search. I will present results comparing relevance assessments gathered using crowdsourcing with those gathered from a domain expert for evaluating different search engines in a large government archive. While crowdsourced judgments rank the tested search engines in the same order as expert judgments, crowdsourced workers appear unable to distinguish different levels of highly accurate search results in a way that expert assessors can. The nature of this limitation in crowd sourced workers for this experiment is examined and the viability of crowdsourcing for evaluating search in specialist settings is discussed.

Assessing Performances across Evaluation Cycles: The Example of CLEF 15th Birthday
Nicola Ferro, University of Padova, Italy.

Since 2014 marks the 15th birthday of CLEF, we have conducted a longitudinal study to assess the impact of CLEF evaluation cycles for multilingual ad-hoc retrieval. Monolingual retrieval shows a positive trend, even if the performance increase is not always steady from year to year; bilingual retrieval demonstrates higher improvements in recent years, probably due to the better linguistic resources now available; and, multilingual retrieval exhibit constant improvement and performances comparable to bilingual (and, sometimes, even monolingual) ones. We will also discuss the methodology adopted to compare results across evaluation cycles and test collections and highlight some of its limitations.