Using Monolingual Data in Neural Machine Translation: a Systematic Study - Laboratoire d'Informatique pour la Mécanique et les Sciences de l'Ingénieur Accéder directement au contenu
Communication Dans Un Congrès Année : 2018

Using Monolingual Data in Neural Machine Translation: a Systematic Study

Franck Burlot
  • Fonction : Auteur
  • PersonId : 1021079
François Yvon

Résumé

Neural Machine Translation (MT) has radically changed the way systems are developed. A major difference with the previous generation (Phrase-Based MT) is the way monolingual target data, which often abounds, is used in these two paradigms. While Phrase-Based MT can seamlessly integrate very large language models trained on billions of sentences, the best option for Neural MT developers seems to be the generation of artificial parallel data through \textsl{back-translation} - a technique that fails to fully take advantage of existing datasets. In this paper, we conduct a systematic study of back-translation, comparing alternative uses of monolingual data, as well as multiple data generation procedures. Our findings confirm that back-translation is very effective and give new explanations as to why this is the case. We also introduce new data simulation techniques that are almost as effective, yet much cheaper to implement.
Fichier principal
Vignette du fichier
WMT015.pdf (565.06 Ko) Télécharger le fichier
Origine : Fichiers éditeurs autorisés sur une archive ouverte
Loading...

Dates et versions

hal-01910235 , version 1 (02-12-2018)

Identifiants

  • HAL Id : hal-01910235 , version 1

Citer

Franck Burlot, François Yvon. Using Monolingual Data in Neural Machine Translation: a Systematic Study. Conference on Machine Translation, Oct 2018, Brussels, Belgium. ⟨hal-01910235⟩
102 Consultations
241 Téléchargements

Partager

Gmail Facebook X LinkedIn More