Sotabase
Home
Researchers
Career
·
Research (ML Systems)
,
Stanford University
2024–
Publications
(3)
SpecInfer: Accelerating Large Language Model Serving with Tree-based Speculative Inference and Verification
International Conference on Architectural Support for Programming Languages and Operating Systems · 2023
270
cited
SpecInfer: Accelerating Generative LLM Serving with Speculative Inference and Token Tree Verification
arXiv.org · 2023
139
cited
SpecInfer: Accelerating Generative Large Language Model Serving with Speculative Inference and Token Tree Verification
2023
11
cited
Sotabase
Rae Wong | Researcher Profile | Sotabase | Sotabase