Exploring Techniques for Improving Automatic Speech Recognition from Pre-trained Models.
Date8th May 2023
Time03:00 PM
Venue CSD 308
PAST EVENT
Details
Automatic Speech Recognition (ASR) systems suffer when the target domain of the well-trained model is different from the source domain on which the model was trained. Moreover, the availability of in-domain data is scarce for a lot of scenarios (e.g., telephonic narrow-band speech), and the collection of large amounts of data to train supervised models is very costly. We explore methods to use pretrained models to improve ASR accuracy in low-resource domains. Self-supervised learning (SSL) models do not require labeled data and are highly popular in ASR research. We explore an ensemble of complementary features from different pretrained SSL models, which performs better than individual SSL models on both in and out-of-domain data. A domain adaptation technique on a well-trained ASR conformer model is proposed that uses acoustic information from the encoder layers. The well-trained ASR model is fine-tuned with updating layers and freezing layers to gain insights on how it affects the ASR performance. The ASR performance developed by different methods from pretrained models is compared with baseline models. The relative improvements from these methods and conclusions are drawn out from several experiments on different datasets.
Speakers
Ms. Sukhadia Vrunda Nileshkumar (EE20S008)
Electrical Engineering