Arian Hosseini | Researcher Profile | Sotabase

Career

· Researcher, Mila - Université de Montréal

· Researcher, University of Illinois at Urbana-Champaign

Publications (16)

Generative Verifiers: Reward Modeling as Next-Token Prediction

International Conference on Learning Representations · 2024

360

cited

V-STaR: Training Verifiers for Self-Taught Reasoners

arXiv.org · 2024

195

cited

Understanding by Understanding Not: Modeling Negation in Language Models

North American Chapter of the Association for Computational Linguistics · 2021

103

cited

Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal Sampling

International Conference on Learning Representations · 2024

cited

The N+ Implementation Details of RLHF with PPO: A Case Study on TL;DR Summarization

arXiv.org · 2024

cited

Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models

International Conference on Learning Representations · 2024

cited

Joint Prompt Optimization of Stacked LLMs using Variational Inference

Neural Information Processing Systems · 2023

cited

On the Compositional Generalization Gap of In-Context Learning

BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP · 2022

cited

Not All LLM Reasoners Are Created Equal

arXiv.org · 2024

cited

When To Solve, When To Verify: Compute-Optimal Problem Solving and Generative Verification for LLM Reasoning

arXiv.org · 2025

cited

Putting the Value Back in RL: Better Test-Time Scaling by Unifying LLM Reasoners With Verifiers

arXiv.org · 2025

cited

Shape of Thought: When Distribution Matters More than Correctness in Reasoning Tasks

arXiv.org · 2025

cited

Multi-Turn Puzzles: Evaluating Interactive Reasoning and Strategic Dialogue in LLMs

arXiv.org · 2025

cited

Accurate, yet Inconsistent? Consistency Analysis on Language Models

2021

CoLD: Counterfactually-Guided Length Debiasing for Process Reward Models in Mathematical Reasoning

Sotabase