Sotabase

Career

· ML Research Engineer, Cerebras
· Doctoral Assistant, Machine Learning and Optimization Laboratory, EPFL (École Polytechnique Fédérale de Lausanne)
· MS Graduate, Stanford University
· Autopilot Engineer, Tesla

Publications (0)

Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations
2024
104
cited
Online Normalization for Training Neural Networks
2019
59
cited
Pipelined Backpropagation at Scale: Training Large Models without Batches
2020
35
cited
Rotational Equilibrium: How Weight Decay Balances Learning Across Neural Networks
2023
33
cited
Analyzing & Reducing the Need for Learning Rate Warmup in GPT Training
2024
15
cited
Multiplication-Free Transformer Training via Piecewise Affine Operations
2023
8
cited
Weight Decay may matter more than muP for Learning Rate Transfer in Practice
2025
7
cited
Adaptive Braking for Mitigating Gradient Delay
2020
4
cited
Training Dynamics of the Cooldown Stage in Warmup-Stable-Decay Learning Rate Scheduler
2025
4
cited
Analyzing & Eliminating Learning Rate Warmup in GPT Pre-Training
2
cited
Ghost Noise for Regularizing Deep Neural Networks
2023
2
cited
Memory Efficient Mixed-Precision Optimizers
2023
2
cited
Rotational Optimizers: Simple & Robust DNN Training
2023
Sotabase