Skip to the content.

Final Report

Objectives

Acknowledgement

I would like to thank Google Summer of Code and TensorFlow for giving me this opportunity. I am grateful to my mentors Sayak Paul, Morgan Roff & Jaeyoun Kim for their continuous support and guidance. I would like to also thank TPU Research Cloud for providing me with high-performance TPUs that allowed me to train the models at scale.

Milestones achieved

For details on running my training scripts, please refer to README of my repository.

Pull Requests / Commits

Here is the list of commits/PR made by me during GSoC’21:

Description Repository Link
Implement & train Wav2Vec2 model in TensorFlow vasudevgupta7/gsoc-wav2vec2 Commits
Export fine-tuned Wav2Vec2 model to TFHub tensorflow/tfhub.dev #68
Export pre-trained Wav2Vec2 model to TFHub tensorflow/tfhub.dev #65
Add notebook for demonstrating Wav2Vec2 fine-tuning tensorflow/hub #788

Notebooks

The following table summarizes the notebooks, I made during my GSoC tenure:

Notebook Description
Open In Colab This notebook gives you a template to fine-tune a pre-trained Wav2Vec2 SavedModel
Open In Colab This notebook demonstrates conversion of TF Wav2Vec2 model to ONNX and compares the latency of ONNX exported model & TF model on CPU
Open In Colab This notebook demonstrates Wav2Vec2 evaluation (without any padding) on LibriSpeech data
Open In Colab This notebook demonstrates Wav2Vec2 SavedModel evaluation (with constant padding upto 246000 length) on LibriSpeech data
Open In Colab This notebook shows a small demo of how to use Wav2Vec2 for inference for ASR task

Results

Checkpoint WER (with no padding) WER with constant padding to 246000
vasudevgupta/gsoc-wav2vec2-960h 3.3% 6%
vasudevgupta/finetuned-wav2vec2-960h 5.6% 6.7%

Latency Comparison

Description Latency
ONNX exported model 0.84 secs
JIT-compiled model 2.85 secs

Note: Above table is obtained by benchmarking on Colab CPU. Please refer this notebook for reproducing above table

Parting thoughts

The last 2-3 months were full of lots of learning and coding. GSoC helped me get into the speech domain and motivated me to explore more about the TensorFlow ecosystem. I am thankful to my mentors for their continuous & timely feedback. I am looking forward to contributing more to the TensorFlow community and other awesome open source projects out there.

Future Plans (after GSoC’21)