AI Pre-training Lead, Unbox AI June'21 - Present
- Scaled BehaviorGPT pre-training to billions of parameters and trillions of tokens on hundreds of H100s and studied the emergent properties with increasing scale.
- Led a cross-functional team to build a scalable codebase for training large models and worked with Stanford PhDs to bring research to production. Being pushed to take leadership roles, inspire and understand people, and ultimately create more value.
- Experimented with various loss functions and learned about the model's overfitting patterns. Iteratively optimised training data and carefully refined learning objectives to align model pre-training with desired production use cases.
- Solved complex mathematical equations and translated them into efficient code in Triton/PyTorch to achieve linear scaling with nodes and 5x speedup over PyTorch.
- Implemented an efficient data pipeline in Rust, which enabled support for training on hundreds of gigabytes of datasets and improved efficiency by over 80%.
- BehaviorGPT-powered search & recommendations receive 70M+ visits monthly and boost sales by 20% for an ultimate e-commerce company.
- Working on this project under the guidance of Prof. Gunnar Carlsson and Mr. Rickard Brüel Gabrielsson.