- Training a small-scale Mixture-of-Experts model from scratch on three distinct domains (code, math, natural language) to study expert specialization dynamics.
- Analyzing expert routing patterns and load balancing behavior with and without auxiliary loss, demonstrating how domain specialists emerge during training.
- Creating visualizations and artifacts including routing heatmaps showing token-to-expert assignments and expert utilization across domains.
Software Engineer and independent ML Researcher building models from first principles: LLM training, fine-tuning, and parameter-efficient adaptation. Seeking R&D roles in AI research.