![]() To create our speech personas, we select and record professional voice talents. In the resulting speech database, each utterance is segmented into individual parts, such as phones, syllables, and words. Once a voice talent has been selected, she or he works with our voice development team for several weeks. A diverse script is used for the recordings, designed to contain all the sound patterns of the language in development. The team closely monitors the recording process to check for consistency in pronunciation, accentuation, and style. In the second phase of TTS voice creation, a rich mark-up is added to the speech recordings. Each word, phoneme and stress is annotated as well as several other aspects. ![]() The technical team works its magic on this process - using a powerful combination of Artificial Intelligence and machine learning technologies on big amounts of data to optimize annotations. Our state-of-the-art methodologies are augmented by the linguistic expertise of our team. Through a system of high-quality feedback and a thorough Quality Assurance process by mother-tongue experts, imperfections are continuously corrected. In parallel, ReadSpeaker is also working on the future of text to speech by developing techniques based on deep learning. This technique uses an iterative learning process to minimize objectively measurable differences between the predicted acoustic features and the observed acoustic features in the training set. This makes developing new, smart ReadSpeaker TTS voices with even more lifelike, expressive speech and customizable intonation faster than ever.
0 Comments
Leave a Reply. |