Automatic Classification of Queries by Expected Retrieval Performance

Abstract : This paper presents a method for automatically predicting a degree of average relevance of a retrieved document set returned by a retrieval system in response to a query. For a given retrieval system and document collection, prediction is conceived as query classification. Two classes of queries have been defined: easy and hard. The split point between those two classes is the median value of the average precision over the query collection. This paper proposes several classifiers that select useful features among a set of candidates and use them to predict the class of a query. Classifiers are trained on the results of the systems involved in the TREC 8 campaign. Due to the limited number of available queries, training and test are performed with the leave-one-out and 10-fold cross-validation methods. Two types of classifiers, namely decision trees and support vector machines provide particularly interesting results for a number of systems. A fairly high classification accuracy is obtained using the TREC 8 data (more than 80% of correct prediction in some settings).
Liste complète des métadonnées

Littérature citée [22 références]  Voir  Masquer  Télécharger

https://hal-univ-avignon.archives-ouvertes.fr/hal-02171688
Contributeur : Pierre Jourlin <>
Soumis le : mercredi 3 juillet 2019 - 10:25:03
Dernière modification le : mardi 9 juillet 2019 - 01:22:16

Fichier

sigir2005-qp (1).pdf
Fichiers produits par l'(les) auteur(s)

Identifiants

  • HAL Id : hal-02171688, version 1

Collections

Citation

Renato De, Jens Grivolla, Pierre Jourlin, Renato de Mori. Automatic Classification of Queries by Expected Retrieval Performance. ACM SIGIR 2005 Workshop on Predicting Query Difficulty – Methods and Applications (2005), Aug 2005, Salvador, Brazil. ⟨hal-02171688⟩

Partager

Métriques

Consultations de la notice

3

Téléchargements de fichiers

5