Sotabase
Home
Researchers
Career
·
Research Scientist
,
Anthropic
Publications
(18)
Debating with More Persuasive LLMs Leads to More Truthful Answers
International Conference on Machine Learning · 2024
208
cited
Alignment faking in large language models
arXiv.org · 2024
148
cited
Multi-Agent Risks from Advanced AI
arXiv.org · 2025
88
cited
BALROG: Benchmarking Agentic LLM and VLM Reasoning On Games
International Conference on Learning Representations · 2024
72
cited
The Goldilocks of Pragmatic Understanding: Fine-Tuning Strategy Matters for Implicature Resolution by LLMs
2022
60
cited
JaxMARL: Multi-Agent RL Environments and Algorithms in JAX
Adaptive Agents and Multi-Agent Systems · 2023
54
cited
JaxMARL: Multi-Agent RL Environments in JAX
arXiv.org · 2023
38
cited
MAESTRO: Open-Ended Environment Design for Multi-Agent Reinforcement Learning
International Conference on Learning Representations · 2023
36
cited
Auditing language models for hidden objectives
arXiv.org · 2025
26
cited
Scaling Opponent Shaping to High Dimensional Games
Adaptive Agents and Multi-Agent Systems · 2023
13
cited
Considering Race a Problem of Transfer Learning
2019 IEEE Winter Applications of Computer Vision Workshops (WACVW) · 2018
9
cited
Leading the Pack: N-player Opponent Shaping
arXiv.org · 2023
3
cited
Programming by Backprop: LLMs Acquire Reusable Algorithmic Abstractions During Code Training
arXiv.org · 2025
2
cited
The Concordia Contest: Advancing the Cooperative Intelligence of Language Model Agents
2
cited
InterFeedback: Unveiling Interactive Intelligence of Large Multimodal Models with Human Feedback
Conference on Empirical Methods in Natural Language Processing · 2025
Latent Racial Bias -- Evaluating Racism in Police Stop-and-Searches
2020
Multi-dimensional Affect in Poetry (POCA) Dataset: Acquisition, Annotation and Baseline Results
Affective Computing and Intelligent Interaction · 2021
Sotabase
Akbir Khan | Researcher Profile | Sotabase | Sotabase