Alex Tamkin | Researcher Profile | Sotabase

Career

· Research Scientist, Anthropic2023–

· PhD in Computer Science, Stanford University2018–2023

· Research Intern, Google2017–2019

· Anthropic

· Bachelor of Science, Stanford University

Publications (68)

On the Opportunities and Risks of Foundation Models

arXiv.org · 2021

5,795

cited

Towards Measuring the Representation of Subjective Global Opinions in Language Models

arXiv.org · 2023

346

cited

Understanding the Capabilities, Limitations, and Societal Impact of Large Language Models

arXiv.org · 2021

310

cited

Studying Large Language Model Generalization with Influence Functions

arXiv.org · 2023

276

cited

Many-shot Jailbreaking

Neural Information Processing Systems · 2024

214

cited

Collective Constitutional AI: Aligning a Language Model with Public Input

Conference on Fairness, Accountability and Transparency · 2024

138

cited

Evaluating and Mitigating Discrimination in Language Model Decisions

arXiv.org · 2023

105

cited

Being Optimistic to Be Conservative: Quickly Learning a CVaR Policy

AAAI Conference on Artificial Intelligence · 2019

cited

Sycophancy to Subterfuge: Investigating Reward-Tampering in Large Language Models

arXiv.org · 2024

cited

Which Economic Tasks are Performed with AI? Evidence from Millions of Claude Conversations

arXiv.org · 2025

cited

Eliciting Human Preferences with Language Models

arXiv.org · 2023

cited

Viewmaker Networks: Learning Views for Unsupervised Representation Learning

International Conference on Learning Representations · 2020

cited

Drone.io: A Gestural and Visual Interface for Human-Drone Interaction

IEEE/ACM International Conference on Human-Robot Interaction · 2019

cited

Language Through a Prism: A Spectral Approach for Multiscale Language Representations

Neural Information Processing Systems · 2020

cited

Clio: Privacy-Preserving Insights into Real-World AI Use

arXiv.org · 2024

cited

PERSONA: A Reproducible Testbed for Pluralistic Alignment

International Conference on Computational Linguistics · 2024

cited

Investigating Transferability in Pretrained Language Models

Findings · 2020

cited

Active Learning Helps Pretrained Models Learn the Intended Task

Neural Information Processing Systems · 2022

cited

Codebook Features: Sparse and Discrete Interpretability for Neural Networks

International Conference on Machine Learning · 2023

cited

Distributionally-Aware Exploration for CVaR Bandits

2019

cited

Sotabase