Why are TTS voices still so bad?

May 24, 2024

By Jurgen Lepla

I regularly receive feedback that TTS voices are bad, compared to other TTS voices available on the market

This is a recent example:

The text to speech sounded like it was from the 90's and it cut off the last word of almost every sentence.

This is even with the supposedly improved new voices that were added recently.

Will this be improving with the AI introduction?

5 Replies

Andrew Hanley

2 months ago05/24/24 at 10:16 am (UTC)

I must admit Ive never had this problem. While the TTS is not human realism level, its not awful for AI either (imho), and Ive never experienced "cutting off the last word of almost every sentence"

This last issue sounds more like aproblem with either the course creation, or something in the end users machine.

I once did have something similar happen with a course (using professional VO narration) where the start and end of audio was cut off. The cause turned out to be bluetooth headphones!

I had never heard of this before, but now I introduce a 0.5s silence at start and end of all audio clips and Ive never had a repeat of that customer issue.

Maybe its something similar for you?

Jurgen Lepla
Author

2 months ago05/24/24 at 10:50 am (UTC)

Possibly, but even so, this was just one of the remarks.

Other people say the voices are flat and lack intonation. The emphasis is also often wrong. etc.

Jürgen Schoenemeyer

2 months ago05/24/24 at 6:24 pm (UTC)

Storyline uses Amazon Polly (Standard and Neural voices)

https://docs.aws.amazon.com/polly/latest/dg/available-voices.html

the new "Generative voices" are available only for english (3 voices)

examples:

https://aws.amazon.com/blogs/aws/a-new-generative-engine-and-three-voices-are-now-generally-available-on-amazon-polly/

Ami and Matthew are much better, I don't like Ruth as much, but perhaps the example is poorly chosen

1 month ago06/17/24 at 9:35 am (UTC)

Andrew Hanley

I had never heard of this before, but now I introduce a 0.5s silence at start and end of all audio clips and Ive never had a repeat of that customer issue.

Maybe its something similar for you?

That is exactly what is the problem with the speak. There is ABSOLUTE silence (no data) between each word.

You can use a sound track with quiteness (hiss at around -30db or lower). I put it for the duration of the course.

Soren J Birch

1 month ago06/17/24 at 9:36 am (UTC)

Agree. It is also staccato and you can't adjust pace. The quality of Storyline text-to-speech is bottom of the barrel, and I now use a much better provider of this service. Shame, as the insert speak from slidenotes is a big time saver.