Chenglei Si | Researcher Profile | Sotabase

Career

· Research Intern, FutureHouse2025–

· Research Intern, Microsoft2022–

· PhD Student, Stanford University2018–

· Research Assistant, University of Maryland2018–

Publications (27)

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

arXiv.org · 2022

2,792

cited

Prompting GPT-3 To Be Reliable

International Conference on Learning Representations · 2022

345

cited

The Prompt Report: A Systematic Survey of Prompting Techniques

arXiv.org · 2024

243

cited

Between words and characters: A Brief History of Open-Vocabulary Modeling and Tokenization in NLP

arXiv.org · 2021

199

cited

CharBERT: Character-aware Pre-trained Language Model

International Conference on Computational Linguistics · 2020

123

cited

The Prompt Report: A Systematic Survey of Prompt Engineering Techniques

2024

cited

Better Robustness by More Coverage: Adversarial and Mixup Data Augmentation for Robust Finetuning

Findings · 2020

cited

Measuring Inductive Biases of In-Context Learning with Underspecified Demonstrations

Annual Meeting of the Association for Computational Linguistics · 2023

cited

What does BERT Learn from Multiple-Choice Reading Comprehension Datasets?

arXiv.org · 2019

cited

Towards Bidirectional Human-AI Alignment: A Systematic Review for Clarifications, Framework, and Future Directions

arXiv.org · 2024

cited

Getting MoRE out of Mixture of Language Model Reasoning Experts

Conference on Empirical Methods in Natural Language Processing · 2023

cited

Benchmarking Robustness of Machine Reading Comprehension Models

Findings · 2020

cited

Re-Examining Calibration: The Case of Question Answering

Conference on Empirical Methods in Natural Language Processing · 2022

cited

Better Robustness by More Coverage: Adversarial Training with Mixup Augmentation for Robust Fine-tuning

arXiv.org · 2020

cited

What’s in a Name? Answer Equivalence For Open-Domain Question Answering

Conference on Empirical Methods in Natural Language Processing · 2021

cited

Configurable Foundation Models: Building LLMs from a Modular Perspective

arXiv.org · 2024

cited

Sentiment Aware Neural Machine Translation

Conference on Empirical Methods in Natural Language Processing · 2019

cited

Sub-Character Tokenization for Chinese Pretrained Language Models

Transactions of the Association for Computational Linguistics · 2021

cited

Contextual Experience Replay for Self-Improvement of Language Agents

Annual Meeting of the Association for Computational Linguistics · 2025

cited

Dataset Mention Extraction and Classification

Proceedings of the Workshop on Extracting Structured Knowledge from Scientific Publications · 2019

cited

Sotabase