Automatic Classification of Queries by Expected Retrieval Performance

Abstract : This paper presents a method for automatically predicting a degree of average relevance of a retrieved document set returned by a retrieval system in response to a query. For a given retrieval system and document collection, prediction is conceived as query classification. Two classes of queries have been defined: easy and hard. The split point between those two classes is the median value of the average precision over the query collection. This paper proposes several classifiers that select useful features among a set of candidates and use them to predict the class of a query. Classifiers are trained on the results of the systems involved in the TREC 8 campaign. Due to the limited number of available queries, training and test are performed with the leave-one-out and 10-fold cross-validation methods. Two types of classifiers, namely decision trees and support vector machines provide particularly interesting results for a number of systems. A fairly high classification accuracy is obtained using the TREC 8 data (more than 80% of correct prediction in some settings).
Complete list of metadatas

Cited literature [22 references]  Display  Hide  Download

https://hal-univ-avignon.archives-ouvertes.fr/hal-02171688
Contributor : Pierre Jourlin <>
Submitted on : Wednesday, July 3, 2019 - 10:25:03 AM
Last modification on : Tuesday, July 9, 2019 - 1:22:16 AM

File

sigir2005-qp (1).pdf
Files produced by the author(s)

Identifiers

  • HAL Id : hal-02171688, version 1

Collections

Citation

Renato De, Jens Grivolla, Pierre Jourlin, Renato de Mori. Automatic Classification of Queries by Expected Retrieval Performance. ACM SIGIR 2005 Workshop on Predicting Query Difficulty – Methods and Applications (2005), Aug 2005, Salvador, Brazil. ⟨hal-02171688⟩

Share

Metrics

Record views

7

Files downloads

9