Sotabase
Home
Researchers
Career
·
Research Scientist
,
Anthropic
2023–
·
PhD in Computer Science
,
Stanford University
2018–2023
·
Research Intern
,
Google
2017–2019
·
Anthropic
·
Bachelor of Science
,
Stanford University
Publications
(68)
On the Opportunities and Risks of Foundation Models
arXiv.org · 2021
5,795
cited
Towards Measuring the Representation of Subjective Global Opinions in Language Models
arXiv.org · 2023
346
cited
Understanding the Capabilities, Limitations, and Societal Impact of Large Language Models
arXiv.org · 2021
310
cited
Studying Large Language Model Generalization with Influence Functions
arXiv.org · 2023
276
cited
Many-shot Jailbreaking
Neural Information Processing Systems · 2024
214
cited
Collective Constitutional AI: Aligning a Language Model with Public Input
Conference on Fairness, Accountability and Transparency · 2024
138
cited
Evaluating and Mitigating Discrimination in Language Model Decisions
arXiv.org · 2023
105
cited
Being Optimistic to Be Conservative: Quickly Learning a CVaR Policy
AAAI Conference on Artificial Intelligence · 2019
85
cited
Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models
arXiv.org · 2024
84
cited
Which Economic Tasks are Performed with AI? Evidence from Millions of Claude Conversations
arXiv.org · 2025
80
cited
Eliciting Human Preferences with Language Models
arXiv.org · 2023
78
cited
Viewmaker Networks: Learning Views for Unsupervised Representation Learning
International Conference on Learning Representations · 2020
70
cited
Drone.io: A Gestural and Visual Interface for Human-Drone Interaction
IEEE/ACM International Conference on Human-Robot Interaction · 2019
61
cited
Language Through a Prism: A Spectral Approach for Multiscale Language Representations
Neural Information Processing Systems · 2020
57
cited
Clio: Privacy-Preserving Insights into Real-World AI Use
arXiv.org · 2024
55
cited
PERSONA: A Reproducible Testbed for Pluralistic Alignment
International Conference on Computational Linguistics · 2024
50
cited
Investigating Transferability in Pretrained Language Models
Findings · 2020
48
cited
Active Learning Helps Pretrained Models Learn the Intended Task
Neural Information Processing Systems · 2022
46
cited
Codebook Features: Sparse and Discrete Interpretability for Neural Networks
International Conference on Machine Learning · 2023
41
cited
Distributionally-Aware Exploration for CVaR Bandits
2019
37
cited
Show all 68 papers →
Sotabase
Alex Tamkin | Researcher Profile | Sotabase | Sotabase