Sotabase
Home
Researchers
Career
·
Postdoctoral Researcher
,
UC Berkeley
2024–
Publications
(65)
Humanity's Last Exam
Robotics · 2025
284
cited
Provable Robust Watermarking for AI-Generated Text
International Conference on Learning Representations · 2023
277
cited
Monitoring AI-Modified Content at Scale: A Case Study on the Impact of ChatGPT on AI Conference Peer Reviews
International Conference on Machine Learning · 2024
182
cited
Mapping the Increasing Use of LLMs in Scientific Papers
arXiv.org · 2024
129
cited
Invisible Image Watermarks Are Provably Removable Using Generative AI
Neural Information Processing Systems · 2023
114
cited
Learning to Reason without External Rewards
arXiv.org · 2025
114
cited
Protecting Language Generation Models via Invisible Watermarking
International Conference on Machine Learning · 2023
111
cited
Pride and Prejudice: LLM Amplifies Self-Bias in Self-Refinement
Annual Meeting of the Association for Computational Linguistics · 2024
98
cited
Weak-to-Strong Jailbreaking on Large Language Models
International Conference on Machine Learning · 2024
93
cited
Scalable Best-of-N Selection for Large Language Models via Self-Certainty
arXiv.org · 2025
84
cited
MarkLLM: An Open-Source Toolkit for LLM Watermarking
Conference on Empirical Methods in Natural Language Processing · 2024
78
cited
A Survey on Detection of LLMs-Generated Content
Conference on Empirical Methods in Natural Language Processing · 2023
77
cited
The Hidden Risks of Large Reasoning Models: A Safety Assessment of R1
arXiv.org · 2025
77
cited
DE-COP: Detecting Copyrighted Content in Language Models Training Data
International Conference on Machine Learning · 2024
72
cited
SoK: Watermarking for AI-Generated Content
IEEE Symposium on Security and Privacy · 2024
55
cited
An undetectable watermark for generative image models
IACR Cryptology ePrint Archive · 2024
50
cited
Reward Shaping to Mitigate Reward Hacking in RLHF
arXiv.org · 2025
49
cited
Pre-trained Language Models Can be Fully Zero-Shot Learners
Annual Meeting of the Association for Computational Linguistics · 2022
39
cited
PromptArmor: Simple yet Effective Prompt Injection Defenses
arXiv.org · 2025
38
cited
CodeHalu: Investigating Code Hallucinations in LLMs via Execution-based Verification
AAAI Conference on Artificial Intelligence · 2024
30
cited
Show all 65 papers →
Sotabase
Xuandong Zhao | Researcher Profile | Sotabase | Sotabase