Forum Discussion
Text to Speech software
As for articulation, the examples of TTS I've heard over the past year or so and more recently have been capable of what appears to be only three levels of tone: the standard mid-tone, a slightly raised tone for emphasis, and a slightly lowered tone for endings of sentences. There are several occurrences in the Speech-Over sample where the incorrect syllable or word was emphasized. Despite these three levels of tone and the insertion of pauses, TTS is not even close to approaching compelling speech. (In my personal opinion, although he's a very smart guy, neither is that given by Bill Gates. The pauses in his talk were due only to his having to occasionally refer to the paper he was reading his speech from; they were not purposely placed for impact because what he was talking about at that time was not so enlightening as to warrant dramatic pauses.)
With regard to the proper placement of emphasis, I submit the following example. Read aloud each sentence, emphasizing the word in boldface, to hear how the meaning of the sentence changes from the previous.
"I never said she ate your sandwich." (Somebody else said it)
"I never said she ate your sandwich." (I definitely did not say anything)
"I never said she ate your sandwich." (I implied it)
"I never said she ate your sandwich." (I said someone else did)
"I never said she ate your sandwich." (I said she did something else with the sandwich)
"I never said she ate your sandwich." (I said she ate someone else’s sandwich)
"I never said she ate your sandwich." (I said she ate something else)
"The boss wants it" is motivation only enough to make someone do something. But people who are motivated by fear of consequences are still reluctant and, thus, will not totally commit. Yes, they might force themselves to sit through the lessons, but we can hardly call this being interested and engaged. They might even receive a passing grade at the conclusion of each lesson. But the real test is whether – six months down the road – they can remember and apply what they heard to satisfactory result.
Some comments of colleagues of mine who have heard TTS:
"If i had to listen to more than 60 seconds of this as employee training, i'd quit."
"A man spent years training his dog to walk on its hind legs. When he showed the trick to a friend, the observation was, 'Yes, very impressive... but tell me, why? It will only ever be a curiosity as a dog, and a pale and pointless imitation of a man.'"
"If people don’t benefit from the courses, those producers who are trying to skimp by using TTS will go out of business. The market will decide whether or not TTS is a viable option. I frankly don’t see a future for TTS for anything worth listening to."
Yes, the market will ultimately decide. But TTS still has a LONG way to go before it can be considered engaging enough to generate genuine enthusiasm in the listener. And genuine enthusiasm makes for great learning.
Mike (and Steve)
Thanks for submitting a specific test and challenge for articulation and emphasis in text to speech voices.
I agree that "out of the box" text to speech voices would not pass your test. However, when used with Speech-Over's rhetorical pauses the TTS voices can emphasize the right words and capture the meaning of each sentence in your test.
These results are demonstrated in a video I made up in which a number of TTS voices from different vendors take your test using Speech-Over's rhetorical pauses with good results.
The video is narrated by TTS voices Ryan and Heather from Acapela-Group where rhetorical pauses have been used in the narration as well. The pauses that were inserted are shown by vertical bars "|" on the screen text. This is for instructional purposes only; in fact, pauses are stored internally and do not need to appear on the screen.
Click on this link to see the video:
Mike's Challenge to Speech-Over Text to Speech Voices
Again, I recommend using Speech-Over text to speech voices to save time and costs for applications where voice clarity and consistency together with correct diction, emphasis and phrasing are the main requirements - as in e-learning and training, where people are pre-motivated to learn the material.
For any kind of application, Speech-Over is perfect for generating "scratch audio" for review prior to final professional voice recording - a method that Steve recommends in his comment below.
Using Steve's terminology, I submit that Speech-Over has brought TTS into this decade - and the next.