Forum Discussion
Text to Speech software
As for articulation, the examples of TTS I've heard over the past year or so and more recently have been capable of what appears to be only three levels of tone: the standard mid-tone, a slightly raised tone for emphasis, and a slightly lowered tone for endings of sentences. There are several occurrences in the Speech-Over sample where the incorrect syllable or word was emphasized. Despite these three levels of tone and the insertion of pauses, TTS is not even close to approaching compelling speech. (In my personal opinion, although he's a very smart guy, neither is that given by Bill Gates. The pauses in his talk were due only to his having to occasionally refer to the paper he was reading his speech from; they were not purposely placed for impact because what he was talking about at that time was not so enlightening as to warrant dramatic pauses.)
With regard to the proper placement of emphasis, I submit the following example. Read aloud each sentence, emphasizing the word in boldface, to hear how the meaning of the sentence changes from the previous.
"I never said she ate your sandwich." (Somebody else said it)
"I never said she ate your sandwich." (I definitely did not say anything)
"I never said she ate your sandwich." (I implied it)
"I never said she ate your sandwich." (I said someone else did)
"I never said she ate your sandwich." (I said she did something else with the sandwich)
"I never said she ate your sandwich." (I said she ate someone else’s sandwich)
"I never said she ate your sandwich." (I said she ate something else)
"The boss wants it" is motivation only enough to make someone do something. But people who are motivated by fear of consequences are still reluctant and, thus, will not totally commit. Yes, they might force themselves to sit through the lessons, but we can hardly call this being interested and engaged. They might even receive a passing grade at the conclusion of each lesson. But the real test is whether – six months down the road – they can remember and apply what they heard to satisfactory result.
Some comments of colleagues of mine who have heard TTS:
"If i had to listen to more than 60 seconds of this as employee training, i'd quit."
"A man spent years training his dog to walk on its hind legs. When he showed the trick to a friend, the observation was, 'Yes, very impressive... but tell me, why? It will only ever be a curiosity as a dog, and a pale and pointless imitation of a man.'"
"If people don’t benefit from the courses, those producers who are trying to skimp by using TTS will go out of business. The market will decide whether or not TTS is a viable option. I frankly don’t see a future for TTS for anything worth listening to."
Yes, the market will ultimately decide. But TTS still has a LONG way to go before it can be considered engaging enough to generate genuine enthusiasm in the listener. And genuine enthusiasm makes for great learning.
- SteveFlowers9 years agoCommunity Member
Agree with Mike, here.
I find TTS tremendously helpful in generating scratch audio for stakeholder review before we send it off to have the pro voice it. Saves a ton of time. I really only want to have my narrator read it once.
There are some places where I might use TTS intentionally. Like if I wanted to personify a machine. Used strategically as a production element, there are some situations where the insertion of TTS as a spice and not as a substitute could be really successful.
Unfortunately, most TTS output is pretty awful. Some folks have a high tolerance for awful or their focus is elsewhere so they have extra grace for artificial production elements or "cheapness." To each their own:) I take Joel's point above well. Let folks know up front that it's bot-read and own it if you're going to use TTS as a substitute. At least then they can vote with their feet or use the mute option.
TTS is getting better. But most of it is still stuck in the last decade. There aren't that many voice vendors. Far less than there are tools that use the same licensed voices in their TTS generator. Some are better than others. Loquendo and Ivona were pushing things in the right direction but they both had a long way to go...
I'm in the "If i had to listen to more than 60 seconds of this as employee training, i'd ___________." camp... I'd actually just mute the darn thing, as I have with lots of stuff that isn't well selected or well produced. There are pro narrators (including great ones that get commissioned to execute bad scripts) that scratch my brain too. Or stuff that just simply doesn't need to be narrated at all. These will get the same treatment as high doses of TTS on my machine. I have a volume control for a reason;)
YMMV
- joelharband9 years agoCommunity Member
Steve
Your observation:
"Used strategically as a production element, there are some situations where the insertion of TTS as a spice and not as a substitute could be really successful."
intrigued me. We are always looking for new applications for TTS. Could you expand a little on this idea, maybe with some examples?
Might this also be applicable to live presentations and not just to e-learning?
Thanks,
Joel
- joelharband9 years agoCommunity Member
Mike (and Steve)
Thanks for submitting a specific test and challenge for articulation and emphasis in text to speech voices.
I agree that "out of the box" text to speech voices would not pass your test. However, when used with Speech-Over's rhetorical pauses the TTS voices can emphasize the right words and capture the meaning of each sentence in your test.
These results are demonstrated in a video I made up in which a number of TTS voices from different vendors take your test using Speech-Over's rhetorical pauses with good results.
The video is narrated by TTS voices Ryan and Heather from Acapela-Group where rhetorical pauses have been used in the narration as well. The pauses that were inserted are shown by vertical bars "|" on the screen text. This is for instructional purposes only; in fact, pauses are stored internally and do not need to appear on the screen.
Click on this link to see the video:
Mike's Challenge to Speech-Over Text to Speech Voices
Again, I recommend using Speech-Over text to speech voices to save time and costs for applications where voice clarity and consistency together with correct diction, emphasis and phrasing are the main requirements - as in e-learning and training, where people are pre-motivated to learn the material.
For any kind of application, Speech-Over is perfect for generating "scratch audio" for review prior to final professional voice recording - a method that Steve recommends in his comment below.
Using Steve's terminology, I submit that Speech-Over has brought TTS into this decade - and the next.