Sotabase
Home
Researchers
Career
·
Research Scientist
,
Meta AI
Publications
(28)
Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets
Transactions of the Association for Computational Linguistics · 2021
316
cited
Tagged Back-Translation
Conference on Machine Translation · 2019
228
cited
Lingvo: a Modular and Scalable Framework for Sequence-to-Sequence Modeling
arXiv.org · 2019
214
cited
MADLAD-400: A Multilingual And Document-Level Large Audited Dataset
Neural Information Processing Systems · 2023
204
cited
BLEU Might Be Guilty but References Are Not Innocent
Conference on Empirical Methods in Natural Language Processing · 2020
159
cited
Investigating Multilingual NMT Representations at Scale
Conference on Empirical Methods in Natural Language Processing · 2019
131
cited
Building Machine Translation Systems for the Next Thousand Languages
arXiv.org · 2022
111
cited
Language ID in the Wild: Unexpected Challenges on the Path to a Thousand-Language Web Text Corpus
International Conference on Computational Linguistics · 2020
102
cited
APE at Scale and Its Implications on MT Evaluation Biases
Conference on Machine Translation · 2019
69
cited
Dynamically Composing Domain-Data Selection with Clean-Data Selection by “Co-Curricular Learning” for Neural Machine Translation
Annual Meeting of the Association for Computational Linguistics · 2019
63
cited
Translationese as a Language in “Multilingual” NMT
Annual Meeting of the Association for Computational Linguistics · 2019
50
cited
XTREME-UP: A User-Centric Scarce-Data Benchmark for Under-Represented Languages
Conference on Empirical Methods in Natural Language Processing · 2023
48
cited
Learning a Multi-Domain Curriculum for Neural Machine Translation
Annual Meeting of the Association for Computational Linguistics · 2019
40
cited
Multilingual Machine Translation with Open Large Language Models at Practical Scale: An Empirical Study
North American Chapter of the Association for Computational Linguistics · 2025
36
cited
Towards the Next 1000 Languages in Multilingual Machine Translation: Exploring the Synergy Between Supervised and Self-Supervised Learning
arXiv.org · 2022
30
cited
Writing System and Speaker Metadata for 2,800+ Language Varieties
International Conference on Language Resources and Evaluation · 2022
30
cited
GATITOS: Using a New Multilingual Lexicon for Low-resource Machine Translation
Conference on Empirical Methods in Natural Language Processing · 2023
18
cited
Bilex Rx: Lexical Data Augmentation for Massively Multilingual Machine Translation
arXiv.org · 2023
12
cited
Text Repair Model for Neural Machine Translation
arXiv.org · 2019
8
cited
Exploring Adversarial Learning on Neural Network Models for Text Classification
2015
7
cited
Show all 28 papers →
Sotabase
Isaac Caswell | Researcher Profile | Sotabase | Sotabase