Skip to main content
Lip-syncing Efforts for Transcreating Lecture Videos in Indian Languages

Lip-syncing Efforts for Transcreating Lecture Videos in Indian Languages

Date27th Jan 2022

Time11:00 AM

Venue by googlemeet

PAST EVENT

Details

Abstract:
Educational content is extensively available on the Internet and has grown significantly owing to the pandemic. While there are a large number of lectures available in Chinese and other European languages, resources in India are primarily available only in English, especially technical lectures. Transcreation is a concept borrowed from the field of translation. It is the process of conveying the message from one language to another without any loss of information. This work focuses on the transcreation of lecture videos; the videos are first subjected to automatic speech recognition to produce transcripts in English with time stamps. These are translated into a number of Indian languages. The audio track is reproduced in Indian languages and added back to the source video by the process of lip-syncing. While there are many state-of-the-art systems available for ASR, MT, and TTS, there are hardly any efforts to reproduce the video in Indian languages. The fundamental problem is that the Indian language translations are generally longer. In this work, we have created an isochronous lip-syncing system to transcreate the lecture videos. Isochronous lip-syncing is a technique where translated dialogue must fit exactly when the speaker moves his/her lips to speak.

The lecture videos are often unpracticed speech; the lecturer is bound to repeat sentences, correct his/her statement, use board/presentation, and even stutter during delivery. These add to the challenges in the transcreation process. In this thesis, we have attempted four lip-syncing systems and discussed them briefly, consisting of two baselines and two preliminary systems. The preliminary systems use a simple video-resampling technique based on timestamps from the Sub-Rip text (SRT) file and a silence detection algorithm. This work proposes two novel lip-syncing systems with hidden Markov model - Gaussian mixture model (HMM-GMM) based word alignments and HMM-GMM alignments corrected using group-delay based segmentation. In this work, we have preferred re-sampling video as re-sampling audio will be unpleasant to the listener. Another novelty of the work is the use of audio energy at boundaries detected to avoid abrupt lip movement. Further, for the Indian languages, we have explored a phrasing model based on word terminus syllables, which further improves the lip-syncing system. Signal processing used in tandem with machine learning enables the production of transcreated videos that are usable.

Speakers

Mr. Mano Ranjith Kumar M (CS19S032)

Computer Science & Engineering