A special corpus of Indian languages covering 22 major languages of India. It comprises of 10000+ spoken sentences/utterances each of mono and english recorded by both Male and Female native speakers. Speech waveform files are available in .wav format along with the corresponding text. We hope that these recordings will be useful for researchers and speech technologists working on synthesis and recognition. You can request zip archives of the entire database here. The statistics of datasets available are given here.
Please note that the APPROVAL/REQUESTS would be processed only twice a week- Tuesday (IST 15:00) and Friday (IST 15:00). Requests sent after IST 15:00 on these days will be processed in the next batch only.
Incase of any error or requests, please write to us at smtiitm@gmail.com.Please use subject line: Indic Database Download Request. In the body of the email, please include:
1. Language 2. Gender 3. Type [Mono/English]
Example 1: Language : 'Hindi' and Type : 'English' means that it contains English sentences spoken by a person whose native language is Hindi.
Example 2: Language : 'Hindi' and Type : 'Mono' means that it contains Hindi sentences spoken by a person whose native language is Hindi.By requesting the data, you are confirming that you have read and agreed to be bound by the License For Use of Indic TTS.
Indic TTS : Enhancing text-to-speech (TTS) synthesis for Indian languages, optimizing quality and integrating compact TTS into disability aids and diverse applications.