Automatic Classification of Queries by Expected Retrieval Performance - Avignon Université Access content directly
Conference Papers Year : 2005

Automatic Classification of Queries by Expected Retrieval Performance


This paper presents a method for automatically predicting a degree of average relevance of a retrieved document set returned by a retrieval system in response to a query. For a given retrieval system and document collection, prediction is conceived as query classification. Two classes of queries have been defined: easy and hard. The split point between those two classes is the median value of the average precision over the query collection. This paper proposes several classifiers that select useful features among a set of candidates and use them to predict the class of a query. Classifiers are trained on the results of the systems involved in the TREC 8 campaign. Due to the limited number of available queries, training and test are performed with the leave-one-out and 10-fold cross-validation methods. Two types of classifiers, namely decision trees and support vector machines provide particularly interesting results for a number of systems. A fairly high classification accuracy is obtained using the TREC 8 data (more than 80% of correct prediction in some settings).
Fichier principal
Vignette du fichier
sigir2005-qp (1).pdf (460.88 Ko) Télécharger le fichier
Origin : Files produced by the author(s)

Dates and versions

hal-02171688 , version 1 (03-07-2019)


  • HAL Id : hal-02171688 , version 1


Renato De, Jens Grivolla, Pierre Jourlin, Renato de Mori. Automatic Classification of Queries by Expected Retrieval Performance. ACM SIGIR 2005 Workshop on Predicting Query Difficulty – Methods and Applications (2005), Aug 2005, Salvador, Brazil. ⟨hal-02171688⟩


44 View
19 Download


Gmail Facebook Twitter LinkedIn More