Sotabase
Home
Researchers
Career
·
Technical Staff
,
METR
2024–
·
PhD Student
,
UC Berkeley
Publications
(13)
Clusterability in Neural Networks
arXiv.org · 2021
34
cited
Self-Modification of Policy and Utility Function in Rational Agents
Artificial General Intelligence · 2016
32
cited
Detecting Modularity in Deep Neural Networks
arXiv.org · 2021
16
cited
Quantifying Local Specialization in Deep Neural Networks
2021
12
cited
Neural Networks are Surprisingly Modular
arXiv.org · 2020
11
cited
Loss Bounds and Time Complexity for Speed Priors
International Conference on Artificial Intelligence and Statistics · 2016
9
cited
Pruned Neural Networks are Surprisingly Modular
2020
8
cited
Constrained belief updates explain geometric structures in transformer representations
International Conference on Machine Learning · 2025
6
cited
Exploring Hierarchy-Aware Inverse Reinforcement Learning
arXiv.org · 2018
5
cited
On the Impossibility of Supersized Machines
arXiv.org · 2017
4
cited
G RAPHICAL C LUSTERABILITY AND L OCAL S PECIALIZATION IN D EEP N EURAL N ETWORKS
2022
M ODEL M ANIPULATION A TTACKS E NABLE M ORE R IGOROUS E VALUATIONS OF LLM C APABILITIES
Sotabase
Daniel Filan | Researcher Profile | Sotabase | Sotabase