LeBenchmark: A Reproducible Framework for Assessing Self-Supervised Representation Learning from Speech - Laboratoire Informatique d'Avignon

Documentation
Français (FR)

Anglais (EN)

Communication Dans Un Congrès Année : 2021

LeBenchmark: A Reproducible Framework for Assessing Self-Supervised Representation Learning from Speech

(1) , (1, 2) , (1) , (1) , , (1) , (1) , (2) , (1) , (2) , (3) , (2) , (1) , (1) , (1) , (1) , (4, 5, 1) , (6)

1
2
3
4
5
6

Solène Evain

Fonction : Auteur
PersonId : 737268
IdHAL : solene-evain
ORCID : 0000-0003-1766-8894

Groupe d’Étude en Traduction Automatique/Traitement Automatisé des Langues et de la Parole

Ha Nguyen

Fonction : Auteur

Groupe d’Étude en Traduction Automatique/Traitement Automatisé des Langues et de la Parole

Laboratoire Informatique d'Avignon

Hang Le

Fonction : Auteur

Groupe d’Étude en Traduction Automatique/Traitement Automatisé des Langues et de la Parole

Marcely Zanon Boito

Fonction : Auteur
PersonId : 752406
IdHAL : mzboito

Groupe d’Étude en Traduction Automatique/Traitement Automatisé des Langues et de la Parole

Salima Mdhaffar

Fonction : Auteur

Sina Alisamir

Fonction : Auteur

Groupe d’Étude en Traduction Automatique/Traitement Automatisé des Langues et de la Parole

Ziyi Tong

Fonction : Auteur

Groupe d’Étude en Traduction Automatique/Traitement Automatisé des Langues et de la Parole

Natalia Tomashenko

Fonction : Auteur
PersonId : 17002
IdHAL : natalia-tomashenko
IdRef : 223393304

Laboratoire Informatique d'Avignon

Marco Dinarelli

Fonction : Auteur
PersonId : 12699
IdHAL : marco-dinarelli
IdRef : 22461939X

Groupe d’Étude en Traduction Automatique/Traitement Automatisé des Langues et de la Parole

Titouan Parcollet

Fonction : Auteur
PersonId : 174514
IdHAL : titouan-parcollet
ORCID : 0000-0003-0672-1346

Laboratoire Informatique d'Avignon

Alexandre Allauzen

Fonction : Auteur
PersonId : 171266
IdHAL : alexandre-allauzen
IdRef : 078187621

Laboratoire d'analyse et modélisation de systèmes pour l'aide à la décision

Yannick Estève

Fonction : Auteur
PersonId : 11645
IdHAL : yannick-esteve
ORCID : 0000-0002-3656-8883
IdRef : 070531668

Laboratoire Informatique d'Avignon

Benjamin Lecouteux

Fonction : Auteur
PersonId : 7847
IdHAL : benjamin-lecouteux
ORCID : 0000-0003-3000-6190
IdRef : 135355060

Groupe d’Étude en Traduction Automatique/Traitement Automatisé des Langues et de la Parole

François Portet

Fonction : Auteur
PersonId : 1069
IdHAL : francois-portet
ORCID : 0000-0003-2542-0661
IdRef : 098179160

Groupe d’Étude en Traduction Automatique/Traitement Automatisé des Langues et de la Parole

Solange Rossato

Fonction : Auteur
PersonId : 746390
IdHAL : solange-rossato

Groupe d’Étude en Traduction Automatique/Traitement Automatisé des Langues et de la Parole

Fabien Ringeval

Fonction : Auteur
PersonId : 13134
IdHAL : fabien-ringeval
ORCID : 0000-0002-9213-4529
IdRef : 154573078

Groupe d’Étude en Traduction Automatique/Traitement Automatisé des Langues et de la Parole

Didier Schwab

Fonction : Auteur
PersonId : 4261
IdHAL : didier-schwab
ORCID : 0000-0002-2462-8148
IdRef : 069192359

Université Grenoble Alpes

Laboratoire d'Informatique de Grenoble

Groupe d’Étude en Traduction Automatique/Traitement Automatisé des Langues et de la Parole

Laurent Besacier

Fonction : Auteur

Naver Labs Europe [Meylan]

Résumé

Self-Supervised Learning (SSL) using huge unlabeled data has been successfully explored for image and natural language processing. Recent works also investigated SSL from speech. They were notably successful to improve performance on downstream tasks such as automatic speech recognition (ASR). While these works suggest it is possible to reduce dependence on labeled data for building efficient speech systems, their evaluation was mostly made on ASR and using multiple and heterogeneous experimental settings (most of them for English). This questions the objective comparison of SSL approaches and the evaluation of their impact on building speech systems. In this paper, we propose LeBenchmark: a reproducible framework for assessing SSL from speech. It not only includes ASR (high and low resource) tasks but also spoken language understanding, speech translation and emotion recognition. We also focus on speech technologies in a language different than English: French. SSL models of different sizes are trained from carefully sourced and documented datasets. Experiments show that SSL is beneficial for most but not all tasks which confirms the need for exhaustive and reliable benchmarks to evaluate its real impact. LeBenchmark is shared with the scientific community for reproducible research in SSL from speech.

Mots clés

Self-Supervised Representation Learning ASR SLU Speech Translation Automatic Emotion Recognition

Domaines

Intelligence artificielle [cs.AI]

Fichier principal

Vignette du fichier

FLOWBERT_IS2021.pdf (163.57 Ko)

Origine : Fichiers produits par l'(les) auteur(s)

Didier Schwab : Connectez-vous pour contacter le contributeur

https://hal.science/hal-03317730

Soumis le : samedi 7 août 2021-09:00:15

Dernière modification le : vendredi 19 avril 2024-16:18:54

Dates et versions

hal-03317730 , version 1 (07-08-2021)

hal-03317730 , version 2 (05-11-2021)

hal-03317730 , version 3 (25-11-2021)

Identifiants

HAL Id : hal-03317730 , version 1

Citer

Solène Evain, Ha Nguyen, Hang Le, Marcely Zanon Boito, Salima Mdhaffar, et al.. LeBenchmark: A Reproducible Framework for Assessing Self-Supervised Representation Learning from Speech. INTERSPEECH 2021: Conference of the International Speech Communication Association, Aug 2021, Brno, Czech Republic. ⟨hal-03317730v1⟩

Exporter

BibTeX XML-TEI Dublin Core DC Terms EndNote DataCite

421 Consultations

306 Téléchargements

Partager