Logo
Indic TTS
Data Statistics
A corpus of Indian languages covering 22 major languages of India. It comprises of 10000+ spoken sentences/uttererances each of native and english recorded by both Male and Female native speakers. Speech waveform files are available in .wav format along with the corresponding text. We hope that these recordings will be useful for researchers and speech technologists working on synthesis and recognition. The statistics given below include multiple speakers and genders for each language. Detailed statistics of the same is available here.
Consolidated Statistics
LanguageEnglish (Hrs)HTS+STRAIGHT
Assamese23.3527.39
Bengali16.4620.07
Bodo 9.989.78
Gujarati20.1331.69
Hindi31.8741.86
Kannada19.9322.45
Malayalam16.6620.89
Manipuri20.5720.75
Marathi18.7322.36
Odia21.319.18
Punjabi2127
Rajasthani19.6720.38
Tamil45.6153.59
Telugu2136.71
Total Duration 680.36 hours
For detailed statistics clickhere
Logo
Indic TTS
Indic TTS : Enhancing text-to-speech (TTS) synthesis for Indian languages, optimizing quality and integrating compact TTS into disability aids and diverse applications.
Home
About
People
Publications
Demo
Audio Samples
Contact Us
© Copyright 2023, Speech Technology Consortium,
Bhashini, MeiTY and by Hema A Murthy & S Umesh,
Department Of Computer Science and Engineering and Electrical Engineering, IIT MADRAS. All Rights Reserved
Maintained by NetPhenix IT Solutions.