Been Kim | Researcher Profile | Sotabase

Career

· Associate Professor, Massachusetts Institute of Technology2025–

· Assistant Professor, Massachusetts Institute of Technology2021–

Publications (47)

Towards A Rigorous Science of Interpretable Machine Learning

2017

4,603

cited

Sanity Checks for Saliency Maps

Neural Information Processing Systems · 2018

2,245

cited

Concept Bottleneck Models

International Conference on Machine Learning · 2020

1,089

cited

Examples are not enough, learn to criticize! Criticism for Interpretability

Neural Information Processing Systems · 2016

826

cited

Towards Automatic Concept-based Explanations

Neural Information Processing Systems · 2019

762

cited

Visualizing and Measuring the Geometry of BERT

Neural Information Processing Systems · 2019

454

cited

Human-Centered Tools for Coping with Imperfect Algorithms During Medical Decision-Making

International Conference on Human Factors in Computing Systems · 2019

436

cited

The Bayesian Case Model: A Generative Approach for Case-Based Reasoning and Prototype Classification

Neural Information Processing Systems · 2014

333

cited

How do Humans Understand Explanations from Machine Learning Systems? An Evaluation of the Human-Interpretability of Explanation

arXiv.org · 2018

260

cited

Does Localization Inform Editing? Surprising Differences in Causality-Based Localization vs. Knowledge Editing in Language Models

Neural Information Processing Systems · 2023

235

cited

Multiple relative pose graphs for robust cooperative mapping

IEEE International Conference on Robotics and Automation · 2010

226

cited

Towards Realistic Individual Recourse and Actionable Explanations in Black-Box Decision Making Systems

arXiv.org · 2019

206

cited

Model evaluation for extreme risks

arXiv.org · 2023

198

cited

Considerations for Evaluation and Generalization in Interpretable Machine Learning

2018

176

cited

An Evaluation of the Human-Interpretability of Explanation

arXiv.org · 2019

175

cited

A Roadmap for a Rigorous Science of Interpretability

arXiv.org · 2017

153

cited

Human Evaluation of Models Built for Interpretability

AAAI Conference on Human Computation & Crowdsourcing · 2019

141

cited

Getting aligned on representational alignment

Trans. Mach. Learn. Res. · 2023

140

cited

Human-in-the-Loop Interpretability Prior

Neural Information Processing Systems · 2018

126

cited

Impossibility theorems for feature attribution

Proceedings of the National Academy of Sciences of the United States of America · 2022

121

cited

Sotabase