Sotabase
Home
Researchers
Career
·
Researcher/Developer
,
Kotoba Technologies
2023–
·
Researcher/Ph.D. Student
,
University of Washington
2023–
·
Research Intern
,
EPFL
2022–
·
Undergraduate Student
,
University of Tokyo
2020–
Publications
(12)
NanoFlow: Towards Optimal Large Language Model Serving Throughput
arXiv.org · 2024
76
cited
Fiddler: CPU-GPU Orchestration for Fast Inference of Mixture-of-Experts Models
International Conference on Learning Representations · 2024
47
cited
Manticore: Hardware-Accelerated RTL Simulation with Static Bulk-Synchronous Parallelism
International Conference on Architectural Support for Programming Languages and Operating Systems · 2023
16
cited
TeleRAG: Efficient Retrieval-Augmented Generation Inference with Lookahead Retrieval
arXiv.org · 2025
8
cited
LiteASR: Efficient Automatic Speech Recognition with Low-Rank Approximation
Conference on Empirical Methods in Natural Language Processing · 2025
5
cited
A 475 MHz Manycore FPGA Accelerator for RTL Simulation
Symposium on Field Programmable Gate Arrays · 2024
2
cited
ConsumerBench: Benchmarking Generative AI Applications on End-User Devices
arXiv.org · 2025
1
cited
Accelerating Decision Tree Ensemble with Guided Branch Approximation
Heart · 2022
AgentFlux: Decoupled Fine-Tuning&Inference for On-Device Agentic Systems
2025
CiraaS: cloud computing with programmable logic
SIGCOMM Posters and Demos · 2022
VoxServe: Streaming-Centric Serving System for Speech Language Models
2026
Sotabase