Can anyone recommend a good "Text to Speech" software that you have used with Articulate? We are looking for a software that sounds as close to a real voice as possible.
I use Ivona for temporary voices during review periods. It's relatively inexpensive for personal use. For commercial use it's quite a bit more expensive ($700 last I checked). I'll be moving to the built in Mac voices for my temp work - triggering voice to file from terminal. These are slightly lower quality than Ivona or Loquendo. Loquendo is quite good.
Good voices are not cheap. Most of the best ones offer a "pay as you go" program. If you're looking for a good quality voice on the cheap for commercial purposes, I haven't found one yet. I haven't found an application of narration where I'd rather use a synthetic voice than a real one other than for scratch and review.
I use the free version of Natural Reader (http://www.naturalreaders.com/index.htm) as a secondary way to proofread my writing. Don't know anything about their pricing structure, but thought I'd at least throw it out there. Good luck!
That's a great way to use TTS, Rebecca. I find it's so important to hear narrative to evaluate how well it works. I have used TTS programs this way as well but primarily use it to represent a draft of assembled narrative.
I'll throw in iSpeech's text to speech . In my opinion they have the most human sounding voices available, even a tad better than Ivona (although their Polish stuff is amazing). iSpeech offers quite a bit for free/very low fee. Might be just what you are looking for. They also offer some limited speech recognition should you ever need that. Their text to speech is top notch though. I used the service to turn my college reading into mp3s to listen to while cooking and similar when I couldn't read.
I have found that no matter what TTS software or voice I use, there are far too many oddities in pronunciation to efficiently and cost effectively use one for narration. Besides, finding a professional narrator is easy, and depending on who you choose, it can be inexpensive as well.
Check out The Narrator Files. They price narration by the page, and they have exemplary voice talent.
Wondering if anyone here has used the TTS built into Microsoft Office to narrate for elearning? Wonder if there would be a way to record the narration and then use it in an elearning course...
@Steve, wondering if you're using the built-in Mac voices in a similar way?
I am. I've switched over entirely from other TTS programs to Mac voices. Pretty neat trick I use to batch each file using terminal. It takes a little bit to set up my transcript input files, I haven't automated that part yet.
Basically, when the script is approved, I generate a .txt file for each bit of audio (on the plus side, I have found a way to use this as a transcript feeder). Then I setup a batch file for terminal to automatically generate the outputs. The batch template lines look something like this:
say -v lee -f /Users/sflowers/Desktop/Dropbox/projectname/production/scratch_audio_scenarios/s1_c1.txt -o /Users/sflowers/Desktop/Dropbox/projectname/production/scratch_audio_scenarios/s1_c1.aiff
Copying and pasting this line into terminal will grab the text file and output an audio file in the voice I've selected. Copying and pasting multiple lines will do it multiple times. It only fails if there's a funny character or the text file is missing. Easy to pick up by the file size of the output .aiff. All in all pretty fast. And really easy to update. Just update the .txt file and copy / paste the batch line into terminal.
Another cool trick. To grab the path of a file, drag a file from the folder you want to locate into the terminal window. Copy and paste that into your text editor and you've got your path without having to wrack your brain or fiddle with finder.
The current best text to speech software is Text Speaker. It has customizable pronunciation, reads anything on your screen, and it even has talking reminders. It is great for learning as it highlights the words as they are being read. The bundled voices are well priced and sound very human. Voices are available in English, French, Italian, Spanish, German, and more. Easily converts blogs, email, e-books, and more to MP3 or for listening instantly.
It is not at all because I'm a voice-over/narrator that I am opposed to the use of text-to-speech technology. It is solely out of concern over results. The goal of instruction of any kind is to either simply share information or to change behavior/performance. And, just like TV and radio commercials, success hinges squarely on whether the message is able to not only grab but hold the attention of the listener/viewer/learner so that they will ABSORB what they saw and/or heard, and that they will also RETAIN that information so that they can later APPLY it. In the case of commercials, advertisers hope to motivate people to buy their product or service. In instruction, it is hoped that learners will be able to use what they learned to better their performance.
Thus, eLearning success cannot be measured by the quantity of material that was produced in X number of hours or that X number of dollars were saved. Success is measured in what people remember and are able to apply.
I wonder if any company who uses eLearning has done an analysis to compare the money spent on producing the learning content against whether there was a marked improvement in employee performance. It would seem to me, that if a company's goal was to maximize employee performance, the LAST place they'd consider cutting costs would be the tools and methods used to achieve that goal. When the goal is to get to the finish line faster, we don't use cheaper fuel.
It seems that proponents of text-to-speech don't understand that inflection; NATURAL inflection is the key, and synthesized speech, no matter that the "voice" may sound so human-like, will never be able to place correct inflection where needed and not where it is not needed.
Here's a practice exercise for everyone: spend some time listening to people engaged in conversation. People you know, even people you don't know. When we speak among ourselves, we add inflection without thinking about it, placing importance on some words, less on others, a smile here, some compassion there, a sudden burst of excited whisper, the occasional dramatic pause to build anticipation or allow a point to sink in before moving on, etc. We do these things automatically and it turns our speech into music. And it makes those listening more prone to continue listening. Until such a time where the algorithms behind synthesized speech are able to "understand" and contextualize the words (which, to a computer are just more ones and zeroes), it will never be able to add the NATURAL engagement factor called inflection where it's supposed to be and, thus, will never reach the effectiveness level of human speech.
If there is no connection; no grasp of the material so as to place the proper inflection where it belongs, what is spoken is cold and completely non-engaging. Just like some of the boring teachers we all had in school. If there is no natural engagement, there is no hope of holding the attention of a learner. And without their focused attention, the efforts and money spent on everything that went into the eLearning is wasted. I'll say it again:
eLearning success cannot be measured by the quantity of material that was produced in X number of hours or that X number of dollars were saved. eLearning success is measured ONLY in what people remember and are able to apply in order to make a difference.
37 Replies
I use Ivona for temporary voices during review periods. It's relatively inexpensive for personal use. For commercial use it's quite a bit more expensive ($700 last I checked). I'll be moving to the built in Mac voices for my temp work - triggering voice to file from terminal. These are slightly lower quality than Ivona or Loquendo. Loquendo is quite good.
Good voices are not cheap. Most of the best ones offer a "pay as you go" program. If you're looking for a good quality voice on the cheap for commercial purposes, I haven't found one yet. I haven't found an application of narration where I'd rather use a synthetic voice than a real one other than for scratch and review.
Steve,
Thank you for your information and for your response. I really appreciate your help! Happy Wednesday!
Hi Joy,
I use the free version of Natural Reader (http://www.naturalreaders.com/index.htm) as a secondary way to proofread my writing. Don't know anything about their pricing structure, but thought I'd at least throw it out there. Good luck!
That's a great way to use TTS, Rebecca. I find it's so important to hear narrative to evaluate how well it works. I have used TTS programs this way as well but primarily use it to represent a draft of assembled narrative.
Hi Steve,
Yes, it IS a good way to evaluate the narrative. And darn if I don't pick up a mistake that I didn't catch after several re-readings by "eye."
Thank you Rebecca and Steve for your thoughts on this.
Of course, there's always the option of a real person voiceover by moi (shameless plug for myself!)
However, I do agree that TTS is great for reviewing.
Nuance - Dragon Naturally speaking: Nuance Text-to-Speech technology that reads on-screen text in human-sounding synthesized speech. (http://www.nuance.com/for-individuals/by-product/dragon-for-pc/home-version/index.htm)
CC and April,
Thank you for your help. I really do appreciate it.
We've used Ivona in the past as well. It actually sounds pretty good. Some voices are better than others. All in all not bad though.
I'll throw in iSpeech's text to speech . In my opinion they have the most human sounding voices available, even a tad better than Ivona (although their Polish stuff is amazing). iSpeech offers quite a bit for free/very low fee. Might be just what you are looking for. They also offer some limited speech recognition should you ever need that. Their text to speech is top notch though. I used the service to turn my college reading into mp3s to listen to while cooking and similar when I couldn't read.
We use the text to speech in Captivate. They are pretty good. You will just need to give the pauses at the right places.
Thank you all for adding your responses.
Hello Joy,
I have found that no matter what TTS software or voice I use, there are far too many oddities in pronunciation to efficiently and cost effectively use one for narration. Besides, finding a professional narrator is easy, and depending on who you choose, it can be inexpensive as well.
Check out The Narrator Files. They price narration by the page, and they have exemplary voice talent.
Best!
Mike
This post was removed by a moderator
On a related note: Use Google to generate foreign language audio files.
Amazing suggestions from articulates enthusiasts that's why i love E-Learning Heroes
Wondering if anyone here has used the TTS built into Microsoft Office to narrate for elearning? Wonder if there would be a way to record the narration and then use it in an elearning course...
@Steve, wondering if you're using the built-in Mac voices in a similar way?
I am. I've switched over entirely from other TTS programs to Mac voices. Pretty neat trick I use to batch each file using terminal. It takes a little bit to set up my transcript input files, I haven't automated that part yet.
Basically, when the script is approved, I generate a .txt file for each bit of audio (on the plus side, I have found a way to use this as a transcript feeder). Then I setup a batch file for terminal to automatically generate the outputs. The batch template lines look something like this:
say -v lee -f /Users/sflowers/Desktop/Dropbox/projectname/production/scratch_audio_scenarios/s1_c1.txt -o /Users/sflowers/Desktop/Dropbox/projectname/production/scratch_audio_scenarios/s1_c1.aiff
Copying and pasting this line into terminal will grab the text file and output an audio file in the voice I've selected. Copying and pasting multiple lines will do it multiple times. It only fails if there's a funny character or the text file is missing. Easy to pick up by the file size of the output .aiff. All in all pretty fast. And really easy to update. Just update the .txt file and copy / paste the batch line into terminal.
Another cool trick. To grab the path of a file, drag a file from the folder you want to locate into the terminal window. Copy and paste that into your text editor and you've got your path without having to wrack your brain or fiddle with finder.
Sounds sweet, but I'm on a PC rather than a Mac... going to dig into this a bit and see if there is something equivalent that could be done
If you manage to get ahold of a Mac, download the voice "Samantha". It's a dead ringer for Siri.
The current best text to speech software is Text Speaker. It has customizable pronunciation, reads anything on your screen, and it even has talking reminders. It is great for learning as it highlights the words as they are being read. The bundled voices are well priced and sound very human. Voices are available in English, French, Italian, Spanish, German, and more. Easily converts blogs, email, e-books, and more to MP3 or for listening instantly.
LOL totally true, Steve!
It is not at all because I'm a voice-over/narrator that I am opposed to the use of text-to-speech technology. It is solely out of concern over results. The goal of instruction of any kind is to either simply share information or to change behavior/performance. And, just like TV and radio commercials, success hinges squarely on whether the message is able to not only grab but hold the attention of the listener/viewer/learner so that they will ABSORB what they saw and/or heard, and that they will also RETAIN that information so that they can later APPLY it. In the case of commercials, advertisers hope to motivate people to buy their product or service. In instruction, it is hoped that learners will be able to use what they learned to better their performance.
Thus, eLearning success cannot be measured by the quantity of material that was produced in X number of hours or that X number of dollars were saved. Success is measured in what people remember and are able to apply.
I wonder if any company who uses eLearning has done an analysis to compare the money spent on producing the learning content against whether there was a marked improvement in employee performance. It would seem to me, that if a company's goal was to maximize employee performance, the LAST place they'd consider cutting costs would be the tools and methods used to achieve that goal. When the goal is to get to the finish line faster, we don't use cheaper fuel.
It seems that proponents of text-to-speech don't understand that inflection; NATURAL inflection is the key, and synthesized speech, no matter that the "voice" may sound so human-like, will never be able to place correct inflection where needed and not where it is not needed.
Here's a practice exercise for everyone: spend some time listening to people engaged in conversation. People you know, even people you don't know. When we speak among ourselves, we add inflection without thinking about it, placing importance on some words, less on others, a smile here, some compassion there, a sudden burst of excited whisper, the occasional dramatic pause to build anticipation or allow a point to sink in before moving on, etc. We do these things automatically and it turns our speech into music. And it makes those listening more prone to continue listening. Until such a time where the algorithms behind synthesized speech are able to "understand" and contextualize the words (which, to a computer are just more ones and zeroes), it will never be able to add the NATURAL engagement factor called inflection where it's supposed to be and, thus, will never reach the effectiveness level of human speech.
If there is no connection; no grasp of the material so as to place the proper inflection where it belongs, what is spoken is cold and completely non-engaging. Just like some of the boring teachers we all had in school. If there is no natural engagement, there is no hope of holding the attention of a learner. And without their focused attention, the efforts and money spent on everything that went into the eLearning is wasted. I'll say it again:
eLearning success cannot be measured by the quantity of material that was produced in X number of hours or that X number of dollars were saved. eLearning success is measured ONLY in what people remember and are able to apply in order to make a difference.