Why are TTS voices still so bad?

May 24, 2024

I regularly receive feedback that TTS voices are bad, compared to other TTS voices available on the market

This is a recent example: 

The text to speech sounded like it was from the 90's and it cut off the last word of almost every sentence.

This is even with the supposedly improved new voices that were added recently. 

 

Will this be improving with the AI introduction? 

3 Replies
Andrew Hanley

I must admit Ive never had this problem. While the TTS is not human realism level, its not awful for AI either (imho), and Ive never experienced "cutting off the last word of almost every sentence"

This last issue sounds more like aproblem with either the course creation, or something in the end users machine.

I once did have something similar happen with a course (using professional VO narration) where the start and end of audio was cut off. The cause turned out to be bluetooth headphones!

I had never heard of this before, but now I introduce a 0.5s silence at start and end of all audio clips and Ive never had a repeat of that customer issue.

Maybe its something similar for you?

Jürgen Schoenemeyer

Storyline uses Amazon Polly (Standard and Neural voices)

https://docs.aws.amazon.com/polly/latest/dg/available-voices.html

the new "Generative voices" are available only for english (3 voices)

examples:

https://aws.amazon.com/blogs/aws/a-new-generative-engine-and-three-voices-are-now-generally-available-on-amazon-polly/

Ami and Matthew are much better, I don't like Ruth as much, but perhaps the example is poorly chosen