Skip to Main content Skip to Navigation
Conference papers

Remplacement de mentions pour l'adaptation d'un corpus de reconnaissance d'entités nommées à un domaine cible

Abstract : Named Entity Recognition is a well-studied natural language processing task, that is useful in a number of applications. Since recently, deep-learning models are able to solve this task with good performance. However, datasets used to train and evaluate those models cover a sparse number of domains (newswire, web). As performance for a model trained on a specific domain are generally lower on another one, this implies lower performance for less covered domains. In order to fix this issue, this article proposes to use a data augmentation technique that can be used to adapt a named entity recognition corpus from a source domain to a target domain where the encountered names can be different. We apply this technique to fantasy novels, and we show that it can yield performance gains in that context.
Complete list of metadata

https://hal.archives-ouvertes.fr/hal-03651510
Contributor : Arthur Amalvy Connect in order to contact the contributor
Submitted on : Wednesday, June 22, 2022 - 4:18:25 PM
Last modification on : Friday, August 5, 2022 - 2:54:52 PM

Identifiers

  • HAL Id : hal-03651510, version 3

Citation

Arthur Amalvy, Vincent Labatut, Richard Dufour. Remplacement de mentions pour l'adaptation d'un corpus de reconnaissance d'entités nommées à un domaine cible. 29ème Conférence sur le Traitement Automatique des Langues Naturelles (TALN), Jun 2022, Avignon, France. pp.198-205. ⟨hal-03651510v3⟩

Share

Metrics

Record views

72

Files downloads

22