About me

  • I graduated from the Indian Institute of Technology Madras with a Dual Degree (M.Tech. Data Science + B.Tech. Mechanical Engineering).
  • My research interests lie in large models for modelling text, vision, and human behaviour. I love to use technology for the good of society. I am very interested in what neural networks learn and in understanding their overfitting patterns.
  • I work on pre-training models at UnboxAI and lead the AI and data team. My daily job is to train and scale our BehaviorGPT model on hundreds of H100s.
  • I am also one of the world's youngest Google Developer Experts (GDE) in Machine Learning (JAX).
  • During my undergraduate studies, I had the privilege of contributing the BigBird model to HuggingFace Transformers and building a library for training speech-to-text models.
  • My projects received over 200 stars and 50 forks on GitHub, and my open-source work was featured in the TensorFlow newsletter (Google).
Experience

Work Experience

AI Pre-training Lead, Unbox AI June'21 - Present

  • Scaled BehaviorGPT pre-training to billions of parameters and trillions of tokens on hundreds of H100s and studied the emergent properties with increasing scale.
  • Led a cross-functional team to build a scalable codebase for training large models and worked with Stanford PhDs to bring research to production. Being pushed to take leadership roles, inspire and understand people, and ultimately create more value.
  • Experimented with various loss functions and learned about the model's overfitting patterns. Iteratively optimised training data and carefully refined learning objectives to align model pre-training with desired production use cases.
  • Solved complex mathematical equations and translated them into efficient code in Triton/PyTorch to achieve linear scaling with nodes and 5x speedup over PyTorch.
  • Implemented an efficient data pipeline in Rust, which enabled support for training on hundreds of gigabytes of datasets and improved efficiency by over 80%.
  • BehaviorGPT-powered search & recommendations receive 70M+ visits monthly and boost sales by 20% for an ultimate e-commerce company.
  • Working on this project under the guidance of Prof. Gunnar Carlsson and Mr. Rickard Brüel Gabrielsson.

Open-Source Work

HuggingFace (Transformers, Accelerate, Hub) Feb'21 - June'21

  • Contributed BigBird in PyTorch (PR 10183, PR 10991) and JAX (PR 11967), including training scripts (PR 12233), to the HuggingFace Transformers library.
  • Integrated Microsoft’s DeepSpeed with HuggingFace Accelerate, supporting distributed training strategies such as ZeRO-3 & ZeRO-Offload (PR 82).
  • Contributed ModelHubMixin to enable uploading the PyTorch model to Hub (PR 11). ModelHubMixin was presented at PyTorch Ecosystem Days.
  • Received a $3,000 fellowship from HuggingFace and was featured in their newsletter for my open-source work.
  • As part of this project, I have had the opportunity to interact with Mr. Patrick von Platen, Dr. Thomas Wolf, and Dr. Manzil Zaheer.

Google Summer of Code, TensorFlow May'21 - Aug'21

  • Implemented Meta’s Wav2Vec2 and built a library for training speech-to-text models on TPUs (GitHub), earning over 89 stars & 29 forks on GitHub.
  • Trained the model on 300 GB of LibriSpeech data using TPU v3-8 and advanced data streaming, achieving a 3% word error rate on the test set.
  • Created a tutorial to showcase the Wav2Vec2's fine-tuning. This tutorial is available as a part of the official TensorFlow repository (PR 788).
  • This project was done under the guidance of Mr. Sayak Paul, Mr. Morgan Roff and Mr. Jaeyoun Kim. The full report for my work can be found here.
Projects

Research Projects

Long document Question Answering

  • Fine-tuned BigBird on 100 GB of Natural Questions dataset using TPU v3-8 (GitHub).
  • Achieved Exact Match score of 55%, surpassing BigBird paper’s reported score of 53% on test data.
  • Created tutorials to showcase how to evaluate the model for question-answering (link) and summarisation tasks (link).
  • Authored a blog post explaining BigBird's attention, released as a part of HuggingFace's official blog (link).
  • This work was carried out under the invaluable guidance of Dr. Thomas Wolf and Mr. Patrick von Platen from HuggingFace. I also had the privilege of engaging with Dr. Manzil Zaheer, author of the BigBird paper, for insightful discussions about BigBird training.

BioBigBird: Leveraging Entire Articles for Biomedical Language Understanding

  • Pre-trained the BigBird model on 42M articles (~20B tokens) from PubMed using TPU v3-8 and achieved a BLURB score of 86.46 (GitHub).
  • Conducted experiments to determine the high-quality data for pre-training and developed pipelines to extract high-quality text data from 300 GB of raw data.
  • Worked on this project as part of my final year thesis (ID5490, ID5491, ID5492) under the guidance of Prof. Nirav Bhatt and received a 9.76/10 GPA.
  • Presented my work as a poster at the Annual RBCDSAI Research Showcase 2023 (webpage, poster, video).

Optimizing Adapters for Neural Machine Translation

  • Implemented adapters on the top of the Transformers library and fine-tuned mBART for machine translation tasks using adapters (GitHub).
  • Achieved BLEU scores of 25.3 on the HIN->ENG and 18.1 on the GUJ->ENG dataset. Reduced the memory footprint of deployment by over 76% without any significant loss in translation quality.
  • Published a blog post explaining the effectiveness of adapters in multilingual translation systems as a part of OffNote Labs' official blog (link).
  • This work was done under the guidance of Dr. Nishant Sinha from OffNote Labs.

Academic Projects

Implemented NeurIPS paper: Incorporating BERT into parallel sequence decoding with Adapters

  • Successfully implemented the NeurIPS 2020 paper - 'Incorporating BERT into parallel sequence decoding with Adapters' (GitHub).
  • This work was part of a course project for Deep Learning (CS6910) under the guidance of Prof. Anurag Mittal, and I received a 9/10 grade in the course.

Hackathons

Scalathon: Build an Automatic Headline and Sentiment Generator

  • Objective : Process digital content like emails, articles, reports, videos, tweets etc. The task is further broken down into:
    - Headline generation for articles which follow the mobile technology theme.
    - Theme Identification of tweets and articles. If the identified theme is mobile tech, assign a sentiment against the brand described.
  • mBART, a transformer based model from Meta is used for headline generation. It’s an encoder-decoder model which was pretrained with the objective of denoising multiple languages simultaneously.
  • The Non-english tweets are translated to English using the mBart model. We identify mobile brands entities using NER (Named Entity Recognition). We collected an exhaustive list of mobile brands and Aspect based Sentiment Analysis (ABSA) is used to identify a sentiment towards each aspect in the text.
  • Our solution secured the Gold medal at the 9th Inter IIT Tech Meet, organized by IIT Guwahati.
Personal

personal

Public Speaking

I gave a talk on Abhishek Thakur's YouTube Channel (link) and at OffNote Labs (link).

Mentoring

Mentored for Google Summer of Code 2022 project titled ‘Developing NLP Examples Using Hugging Face Transformers’ (link).

Community Service

I joined the National Service Scheme and taught mathematics in a government school while also engaging in collection drives within the campus.

Leadership

I was the strategist of the Analytics club at IIT Madras for the year 2020-21 and contributed to several club activities, including weekly-sessions initiative. I was also part of the E-Cell at IIT Madras and participated in several activities to promote entrepreneurship within the campus.

Get in Touch

Contact

+91 9992526231

Robert Bosch Centre for Data Science and AI,
Indian Institute of Technology Madras,
Chennai, Tamil Nadu

Department of Mechanical Engineering,
Indian Institute of Technology Madras,
Chennai, Tamil Nadu