Spoken document representations for probabilistic retrieval

Abstract : This paper presents some developments in query expansion and document representation of our spoken document retrieval system and shows how various retrieval techniques affect performance for different sets of transcriptions derived from a common speech source. Modifcations of the document representation are used, which combine several techniques for query expansion, knowledge-based on one hand and statistics-based on the other. Taken together, these techniques can improve Average Precision by over 19% relative to a system similar to that which we presented at TREC-7. These new experiments have also confirmed that the degradation of Average Precision due to a word error rate (WER) of 25% is quite small (3.7% relative) and can be reduced to almost zero (0.2% relative). The overall improvement of the retrieval system can also be observed for seven different sets of transcriptions from different recognition engines with a WER ranging from 24.8% to 61.5%. We hope to repeat these experiments when larger document collections become available, in order to evaluate the scalability of these techniques.
Complete list of metadatas

Cited literature [25 references]  Display  Hide  Download

https://hal-univ-avignon.archives-ouvertes.fr/hal-02152860
Contributor : Pierre Jourlin <>
Submitted on : Thursday, June 13, 2019 - 11:29:47 AM
Last modification on : Friday, June 14, 2019 - 1:56:01 AM

File

jourlin.SpeechCom00.pdf
Publisher files allowed on an open archive

Identifiers

  • HAL Id : hal-02152860, version 1

Collections

Citation

Pierre Jourlin, Sue Johnson, Karen Spärck Jones, Philip C. Woodland. Spoken document representations for probabilistic retrieval. Speech Communication, Elsevier : North-Holland, 2000. ⟨hal-02152860⟩

Share

Metrics

Record views

31

Files downloads

41