Rohin Shah | Researcher Profile | Sotabase

Career

· Research Scientist, DeepMind2020–

· PhD in Artificial Intelligence, UC Berkeley2014–2020

Publications (31)

On the Utility of Learning about Humans for Human-AI Coordination

Neural Information Processing Systems · 2019

488

cited

Chlorophyll: synthesis-aided compiler for low-power spatial architectures

ACM-SIGPLAN Symposium on Programming Language Design and Implementation · 2014

155

cited

Does Circuit Analysis Interpretability Scale? Evidence from Multiple Choice Capabilities in Chinchilla

arXiv.org · 2023

142

cited

Goal Misgeneralization: Why Correct Specifications Aren't Enough For Correct Goals

arXiv.org · 2022

113

cited

Evaluating Frontier Models for Dangerous Capabilities

arXiv.org · 2024

103

cited

Optimal Policies Tend To Seek Power

Neural Information Processing Systems · 2019

cited

Explaining grokking through circuit efficiency

arXiv.org · 2023

cited

On the Feasibility of Learning, Rather than Assuming, Human Biases for Reward Inference

International Conference on Machine Learning · 2019

cited

Preferences Implicit in the State of the World

International Conference on Learning Representations · 2019

cited

Evaluating the Robustness of Collaborative Agents

Adaptive Agents and Multi-Agent Systems · 2021

cited

The MAGICAL Benchmark for Robust Imitation

Neural Information Processing Systems · 2020

cited

Beneﬁts of Assistance over Reward Learning

2020

cited

Chlorophyll : Synthesis-Aided Compiler for Low-Power Spatial Architectures by Phitchaya Mangpo Phothilimthana

2015

cited

The MineRL BASALT Competition on Learning from Human Feedback

arXiv.org · 2021

cited

An Empirical Investigation of Representation Learning for Imitation

NeurIPS Datasets and Benchmarks · 2022

cited

IEEE/ACM International Conference on Human-Robot Interaction · 2023

cited

Active Inverse Reward Design

arXiv.org · 2018

cited

Choice Set Misspecification in Reward Inference

AISafety@IJCAI · 2021

cited

Towards Solving Fuzzy Tasks with Human Feedback: A Retrospective of the MineRL BASALT 2022 Competition

Neural Information Processing Systems · 2023

cited

Retrospective on the 2021 MineRL BASALT Competition on Learning from Human Feedback

Neural Information Processing Systems · 2022

cited

Sotabase