Sotabase
Home
Researchers
Career
·
Associate Professor
,
Massachusetts Institute of Technology
2025–
·
Assistant Professor
,
Massachusetts Institute of Technology
2021–
Publications
(0)
Towards A Rigorous Science of Interpretable Machine Learning
2017
4,603
cited
Sanity Checks for Saliency Maps
Neural Information Processing Systems · 2018
2,245
cited
Concept Bottleneck Models
International Conference on Machine Learning · 2020
1,089
cited
Examples are not enough, learn to criticize! Criticism for Interpretability
Neural Information Processing Systems · 2016
826
cited
Towards Automatic Concept-based Explanations
Neural Information Processing Systems · 2019
762
cited
Visualizing and Measuring the Geometry of BERT
Neural Information Processing Systems · 2019
454
cited
Human-Centered Tools for Coping with Imperfect Algorithms During Medical Decision-Making
International Conference on Human Factors in Computing Systems · 2019
436
cited
The Bayesian Case Model: A Generative Approach for Case-Based Reasoning and Prototype Classification
Neural Information Processing Systems · 2014
333
cited
How do Humans Understand Explanations from Machine Learning Systems? An Evaluation of the Human-Interpretability of Explanation
arXiv.org · 2018
260
cited
Does Localization Inform Editing? Surprising Differences in Causality-Based Localization vs. Knowledge Editing in Language Models
Neural Information Processing Systems · 2023
235
cited
Multiple relative pose graphs for robust cooperative mapping
IEEE International Conference on Robotics and Automation · 2010
226
cited
Towards Realistic Individual Recourse and Actionable Explanations in Black-Box Decision Making Systems
arXiv.org · 2019
206
cited
Model evaluation for extreme risks
arXiv.org · 2023
198
cited
Considerations for Evaluation and Generalization in Interpretable Machine Learning
2018
176
cited
An Evaluation of the Human-Interpretability of Explanation
arXiv.org · 2019
175
cited
A Roadmap for a Rigorous Science of Interpretability
arXiv.org · 2017
153
cited
Human Evaluation of Models Built for Interpretability
AAAI Conference on Human Computation & Crowdsourcing · 2019
141
cited
Getting aligned on representational alignment
Trans. Mach. Learn. Res. · 2023
140
cited
Human-in-the-Loop Interpretability Prior
Neural Information Processing Systems · 2018
126
cited
Impossibility theorems for feature attribution
Proceedings of the National Academy of Sciences of the United States of America · 2022
121
cited
Show all 0 papers →
Sotabase