Akbir Khan | Researcher Profile | Sotabase

Career

· Research Scientist, Anthropic

Publications (18)

Debating with More Persuasive LLMs Leads to More Truthful Answers

International Conference on Machine Learning · 2024

208

cited

Alignment faking in large language models

arXiv.org · 2024

148

cited

Multi-Agent Risks from Advanced AI

arXiv.org · 2025

cited

BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games

International Conference on Learning Representations · 2024

cited

The Goldilocks of Pragmatic Understanding: Fine-Tuning Strategy Matters for Implicature Resolution by LLMs

2022

cited

JaxMARL: Multi-Agent RL Environments and Algorithms in JAX

Adaptive Agents and Multi-Agent Systems · 2023

cited

JaxMARL: Multi-Agent RL Environments in JAX

arXiv.org · 2023

cited

MAESTRO: Open-Ended Environment Design for Multi-Agent Reinforcement Learning

International Conference on Learning Representations · 2023

cited

Auditing language models for hidden objectives

arXiv.org · 2025

cited

Scaling Opponent Shaping to High Dimensional Games

Adaptive Agents and Multi-Agent Systems · 2023

cited

Considering Race a Problem of Transfer Learning

2019 IEEE Winter Applications of Computer Vision Workshops (WACVW) · 2018

cited

Leading the Pack: N-player Opponent Shaping

arXiv.org · 2023

cited

Programming by Backprop: LLMs Acquire Reusable Algorithmic Abstractions During Code Training

arXiv.org · 2025

cited

The Concordia Contest: Advancing the Cooperative Intelligence of Language Model Agents

cited

InterFeedback: Unveiling Interactive Intelligence of Large Multimodal Models with Human Feedback

Conference on Empirical Methods in Natural Language Processing · 2025

Latent Racial Bias -- Evaluating Racism in Police Stop-and-Searches

2020

Multi-dimensional Affect in Poetry (POCA) Dataset: Acquisition, Annotation and Baseline Results

Affective Computing and Intelligent Interaction · 2021

Sotabase