Sotabase
Home
Researchers
Career
·
Researcher
,
Mila - Université de Montréal
·
Researcher
,
University of Illinois at Urbana-Champaign
Publications
(16)
Generative Verifiers: Reward Modeling as Next-Token Prediction
International Conference on Learning Representations · 2024
360
cited
V-STaR: Training Verifiers for Self-Taught Reasoners
arXiv.org · 2024
195
cited
Understanding by Understanding Not: Modeling Negation in Language Models
North American Chapter of the Association for Computational Linguistics · 2021
103
cited
Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal Sampling
International Conference on Learning Representations · 2024
67
cited
The N+ Implementation Details of RLHF with PPO: A Case Study on TL;DR Summarization
arXiv.org · 2024
54
cited
Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models
International Conference on Learning Representations · 2024
42
cited
Joint Prompt Optimization of Stacked LLMs using Variational Inference
Neural Information Processing Systems · 2023
39
cited
On the Compositional Generalization Gap of In-Context Learning
BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP · 2022
30
cited
Not All LLM Reasoners Are Created Equal
arXiv.org · 2024
24
cited
When To Solve, When To Verify: Compute-Optimal Problem Solving and Generative Verification for LLM Reasoning
arXiv.org · 2025
24
cited
Putting the Value Back in RL: Better Test-Time Scaling by Unifying LLM Reasoners With Verifiers
arXiv.org · 2025
13
cited
Shape of Thought: When Distribution Matters More than Correctness in Reasoning Tasks
arXiv.org · 2025
4
cited
Multi-Turn Puzzles: Evaluating Interactive Reasoning and Strategic Dialogue in LLMs
arXiv.org · 2025
3
cited
Accurate, yet Inconsistent? Consistency Analysis on Language Models
2021
CoLD: Counterfactually-Guided Length Debiasing for Process Reward Models in Mathematical Reasoning
Sotabase
Arian Hosseini | Researcher Profile | Sotabase | Sotabase