Sumit Pokharel

Software Engineer and independent ML Researcher building models from first principles: LLM training, fine-tuning, and parameter-efficient adaptation. Seeking R&D roles in AI research.

Projects

  • Training a small-scale Mixture-of-Experts model from scratch on three distinct domains (code, math, natural language) to study expert specialization dynamics.
  • Analyzing expert routing patterns and load balancing behavior with and without auxiliary loss, demonstrating how domain specialists emerge during training.
  • Creating visualizations and artifacts including routing heatmaps showing token-to-expert assignments and expert utilization across domains.
  • Built a complete LSTM encoder-decoder architecture following Sutskever et al. (2014), implementing attention-free sequence-to-sequence learning from first principles.
  • Trained end-to-end using CUDA-accelerated pipeline with custom data loaders, gradient clipping, and beam search decoding for inference.
  • Released pretrained weights for German-to-English translation.
  • Implemented landmark architectures from scratch: original Transformer (Vaswani 2017), GPT-2, LLaMA 2/3, and Mistral 7B, each with faithful attention mechanisms and positional encodings.
  • Built rotary positional embeddings (RoPE), grouped-query attention (GQA), sliding window attention, and KV-cache optimization to understand modern efficiency techniques.
  • Built reverse-mode automatic differentiation engine from scratch using NumPy, replicating PyTorch's dynamic computation graph and .backward() semantics.
  • Implemented gradient accumulation, chain rule traversal, and operator overloading for seamless tensor operations.
  • Able to train fully-connected neural networks end-to-end with ReLU activation, validating gradient correctness against PyTorch outputs.

Professional Experience

Rakuten Group, Inc., Software Engineer
Tokyo, Japan
  • Led frontend development of contactless delivery (okihai) and credit card installment systems as lead engineer, while architecting 20+ high-performance React components for the Ichiba checkout revamp serving 50M+ users.
  • Built an internal CLI tool using Claude API to auto-generate component boilerplate from design specs, reducing initial setup time and standardizing patterns across the checkout component library.
  • Built a component dependency graph powered by Claude API that indexes the checkout system's 200+ components as searchable nodes: developers query in natural language to instantly locate code, trace prop flows, and surface usage patterns.
Best Path Research, Machine Learning Intern
Tokyo, Japan
  • Delivered a Python pipeline for digital text to handwritten image conversion with hex code matching, enabling retrieval from 1M+ handwritten character images.
  • Built receipt distortion correction system using SAM for region segmentation, then DocTr-style control point regression for dewarping: trained encoder-decoder on 10K+ synthetically warped images with homography augmentations.

Technical Skills

Languages & Frameworks: Python, PyTorch, NumPy, Tinygrad, TypeScript, JavaScript

Architectures: Transformer, LLaMA 2/3, Mistral, Mixtral, GPT-2/oss, DeepSeek-V2/V3, MoE, Seq2Seq (LSTM)

Techniques & Tools: LoRA, QLoRA, GRPO, Reverse-Mode Autograd, Git, Tmux, Vim

Languages

English (fluent) · Nepali (native) · Japanese (proficient, JLPT N2 certified)

Education

Ritsumeikan Asia Pacific University Bachelor's Degree in Business Administration
Beppu, Japan

CGPA 3.65