BERT meets d'Artagnan: Data Augmentation for Robust Character Detection in Novels - Avignon Université Accéder directement au contenu
Communication Dans Un Congrès Année : 2022

BERT meets d'Artagnan: Data Augmentation for Robust Character Detection in Novels

Arthur Amalvy
  • Fonction : Auteur
  • PersonId : 753566
  • IdHAL : aamalvy
Vincent Labatut

Résumé

Character detection is a task of interest in digital humanities that requires solving multiple natural language processing subtasks such as named entity recognition (NER). While recent deep-learning based models can solve the NER task accurately, most datasets do not cover the literary domain, which leads to lower performance and specific issues for literary texts. In this work, we investigate the use of a BERT model in literary NER and observe that it leads to less errors than previously surveyed models. We further propose to use a simple data augmentation scheme to adapt the classic newswire corpus CoNLL-2003 to the literary domain, fixing some errors and increasing the recall of the model trained on the augmented version of the dataset.
Fichier principal
Vignette du fichier
COMHUM_2022__BERT_meets_d_Artagnan___Data_Augmentation_for_Robust_Character_Detection_in_Novels.pdf (78.6 Ko) Télécharger le fichier
comhum2022_pres.pdf (831.41 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)

Dates et versions

hal-03617722 , version 1 (23-03-2022)
hal-03617722 , version 2 (22-06-2022)

Identifiants

  • HAL Id : hal-03617722 , version 2

Citer

Arthur Amalvy, Vincent Labatut, Richard Dufour. BERT meets d'Artagnan: Data Augmentation for Robust Character Detection in Novels. Workshop on Computational Methods in the Humanities (COMHUM), Jun 2022, Lausanne, Switzerland. ⟨hal-03617722v2⟩
177 Consultations
145 Téléchargements

Partager

Gmail Facebook X LinkedIn More