Forum Discussion
Text to Speech software
Can anyone recommend a good "Text to Speech" software that you have used with Articulate? We are looking for a software that sounds as close to a real voice as possible.
Thanks, Joy
- JoyWeyerCommunity Member
Thank you all for adding your responses.
- MichaelCaseCommunity Member
Hello Joy,
I have found that no matter what TTS software or voice I use, there are far too many oddities in pronunciation to efficiently and cost effectively use one for narration. Besides, finding a professional narrator is easy, and depending on who you choose, it can be inexpensive as well.
Check out The Narrator Files. They price narration by the page, and they have exemplary voice talent.
Best!
Mike
- BrianAllenCommunity Member
Wondering if anyone here has used the TTS built into Microsoft Office to narrate for elearning? Wonder if there would be a way to record the narration and then use it in an elearning course...
@Steve, wondering if you're using the built-in Mac voices in a similar way?
- SteveFlowersCommunity Member
I am. I've switched over entirely from other TTS programs to Mac voices. Pretty neat trick I use to batch each file using terminal. It takes a little bit to set up my transcript input files, I haven't automated that part yet.
Basically, when the script is approved, I generate a .txt file for each bit of audio (on the plus side, I have found a way to use this as a transcript feeder). Then I setup a batch file for terminal to automatically generate the outputs. The batch template lines look something like this:
say -v lee -f /Users/sflowers/Desktop/Dropbox/projectname/production/scratch_audio_scenarios/s1_c1.txt -o /Users/sflowers/Desktop/Dropbox/projectname/production/scratch_audio_scenarios/s1_c1.aiff
Copying and pasting this line into terminal will grab the text file and output an audio file in the voice I've selected. Copying and pasting multiple lines will do it multiple times. It only fails if there's a funny character or the text file is missing. Easy to pick up by the file size of the output .aiff. All in all pretty fast. And really easy to update. Just update the .txt file and copy / paste the batch line into terminal.
- BrianAllenCommunity Member
Sounds sweet, but I'm on a PC rather than a Mac... going to dig into this a bit and see if there is something equivalent that could be done
- DannyStefanic-7Community Member
Mike is right its a balancing act and if you want the best quality use professional narration for sure. If budget or rapid production tilts the balance in the other direction then text to speech can serve as a reasonable alternative, for example where you need accessibility compliance in order to deploy.
I also hear from our ResponsiveVoice users that TTS (Text to Speech) is very useful timesaver in the design of elearning prior to sending off the script for final narration.
We built an add-on for SL2 if you are interested in trying it, I'd love to hear community feedback on it.
- joelharbandCommunity Member
Our Speech-Over Professional text to speech software (www.speechover.com) adds professional e-learning narration fast to PowerPoint based e-learning and training - saving course development time and costs. Speech-Over works with Articulate Presenter as well as with iSpring and Camtasia.
Ordinary text-to-speech applications can only read the text on the PowerPoint slide; Speech-Over does a lot more: the user enters narration texts that describe and explain individual text bullet and graphic objects on the slide like a real presenter would. The texts are stored off-screen. When an object is animated in the slide show, Flash, or video, Speech-Over speaks its narration text in perfect sync. The result has the learning impact of a live presentation.
A text-based audio editor can adjust the diction, inflection and phrasing of the voices - eliminating the pronunciation problems that Mike refers to in his article above.
Speech-Over Professional includes two voices from Acapela-Group with a commercial license.
I invite you to try it.
- MikeHarrison-2aCommunity Member
Joel, I can certainly appreciate your intent and the time and effort you've put into Speechover. Absolutely. I've just listened to the samples in the website video and, my apologies, but while Speechover may well be a solution for those whose only concern is cost, this is still far from impactful or compelling speech.
The biggest misunderstanding in the hiring of voice talent is that it is not at all the SOUND of the voice that is most important. Many people have "nice" voices. The most important facet of speech is the articulation; the way words flow naturally out of the speaker's mouth, with the correct amount of emphasis on only the words and syllables it belongs... and not where it does not belong. In many cases, emphasis placed on the wrong words can change the very meaning of a statement. And, at the very least, naturally articulated speech imparts confidence in the listener. They feel that the speaker knows intimately what he or she is talking about. Anything short of that and the listener is left with reservation over the authenticity of what is being presented.
So, while the "speakers" in the samples I heard had pleasant-sounding voices, not only was emphasis unnatural in many areas, the rhythm of their words still sounds mechanical. There are places where syllables whiz by almost unintelligibly. Learners should not have to replay portions in order to understand them.
eLearning shares one very important basic goal with radio and TV commercials: to motivate people to action. Commercials are successful when they are responsible for sales going up. eLearning is successful when those taking the courses not only remember but are able to apply what they've learned. And those whose task it is to instruct others (especially at the corporate level, where employee performance is everything) better be engaging enough to make people want to listen, and should sound absolutely convincing.
The #1 quality sought in the casting of voices for practically all genres of voice-over is to find the person who is best able to connect with the subject matter so as to convince the listener that what they're hearing is the real deal. We want to have a voice that is pleasant to listen to, yes, but first and foremost we must trust implicitly that what we are hearing is genuine.
Again, my apologies, but my opinion is that only human speech can be regarded as genuine because there is a mixture of emotion and point of view behind it. TTS – software – is not innately capable of emotion or point of view. And even as diction, inflection and phrasing may be adjustable, will someone be willing to spend the time and money to listen to and evaluate every word and then adjust these qualities as necessary in lessons of considerable length? By the time these adjustments are completed, total expenditure would probably equal the fee of a talented professional who would have a finished and more compelling product in less time.
I would suggest to any company preparing to enter into eLearning that they conduct a test with perhaps six minutes of typical training content. Enter the first three minutes of the text into any TTS application and give the remainder to a professional narrator. When each have completed the audio, randomly select a small group of employees to listen; first to the TTS portion of the lesson and, then, without any break or discussion, the professionally spoken portion of the lesson. Then ask the employees their impressions; specifically with the intent on discovering which they would choose to listen to (especially for extended periods) and have more trust in what they were hearing.
Because the success of any eLearning content hinges solely on what employees are able to remember and later apply, I have a very difficult time understanding why any company would even consider cutting costs in the very area that has the power to make employees and, ultimately, their company the best they can be.
Suggested reading: http://www.scilearn.com/blog/prosody-matters-reading-aloud-with-expression
- joelharbandCommunity Member
Mike,
Thanks for your reply to my comment about our product Speech-Over (www.speechover.com) for adding text to speech (TTS) narration to e-learning and training presentations to save time and costs.
I am happy you have set down your objections to text to speech in e-learning so clearly so we can address them one by one.
1. Articulation. A couple of years ago I would have agreed with you, but Speech-Over engineers have made a breakthrough in improving the articulation of TTS voices: by entering simple punctuation in the text, the voice can be made to articulate like the best public speakers. To hear what I am talking about, see the video https://www.youtube.com/watch?v=4fuD15hpUbg - (which is the sample on our website.)
2. Motivate people to action. I submit that corporate students come to their e-learning already motivated (the boss wants it!). The most important thing is to present the material clearly and consistently, including correct diction and articulation so that they can easily understand and retain the material. Our customers report that the retention of the material is the same with TTS as with human voice.
3. Authentic. Our customers have found that once the student begins to learn, they accept and trust the TTS voice just as they would a lecturer with a regional accent. One customer actually notified the learners at the beginning of the course that the voice would be TTS so they knew what to expect.
4. Time required to adjust the articulation. Here you make a good point. To improve the articulation as in #1 above, the Speech-Over user has to enter special pause punctuation (a vertical bar | ) in the text. And, the user has to know the rules of effective public speaking to know where to put the pauses (these rules are presented in a Speech-Over tutorial). I expect that e-learning text editors can do this quickly, but the time required does need to be taken into account.
I agree with your suggestion that companies that want to use TTS voices in developing e-learning and training should run tests to evaluate the savings and to compare retention of material with courses developed with human voices. Free Speech-Over trials are available for this purpose.
- SteveFlowersCommunity Member
All of the TTS I've seen to date carries an unmistakeable synthetic quality. If I wanted to produce a synthetic character in a situation or scenario or to have the application "think out loud", I might find it in my heart to use TTS:) Since TTS simply can't pretend to be non-synthetic, a clear intentional use of synthetic matched with a synthetic design choice could be successful.