Alternative Text-to-Speech (TTS) Solutions for eLearning
Jun 05, 2021
This is a long post about alternative Text-to-Speech solutions. When Articulate introduced the "Converting Text to Speech" feature, I thought it was a great tool. However, after using it a bit, like most of you, I realized that would take much more effort to create speeches that are more natural sounding.
Bottomline Upfront: I use HEROVOICE TTS app to connect to Amazon Polly, Google Cloud, and Microsoft Cognitive Services TTS services. If you would like more details, read more of my comments below. If you want to try using HEROVOICE TTS, you can download it at https://www.microsoft.com/store/apps/9P7Q072TRWMJ.
In my search to find an alternative text-to-speech (TTS) method, I discovered that Amazon Polly is the TTS engine used by A360 to provide the TTS conversion tool in Storyline. Personally, I think Articulate did a good job with implementing Amazon Polly TTS into SL. The tool is very easy to use, very convenient and doesn’t cost anything. However, like many things that are free, there are limitations. In this case, Articulate did not offer all the capabilities that are available with Amazon Polly. For example, neural voices and specific voice styles such as conversational and newscaster. Amazon Polly also supports Speech Synthesis Markup Language (SSML) standards that can be used to markup the text to achieve human-like speech quality. This feature and many others are not offered with the free TTS built into SL.
To illustrate the differences between the standard Amazon Polly voices offered in SL and the neural voices, and added conversation voice style, offered by the Amazon Polly TTS engine, I’ve attached three mp3 audio files. The text I used is the first paragraph of this post. In all three cases, I selected the Matthew voice to record with. In the first example, I used the standard voice, which is what is available in SL. In the second example, I used the neural voice version of Matthew. In the third example, I applied the Amazon custom voice style, Conversational to the Matthew neural voice. In all three cases, I did not use any other supported SSML to modify the speech. I think you’ll readily agree that the neural voice is better than the standard voice and that the conversational style neural voice is the best sounding.
To be fair to Articulate, the TTS tool offered is an excellent TTS tool for testing purposes. However, if you want a higher speech quality, you must be willing to pay for it. In this case, the cost of using Amazon Polly is very affordable. I recently completed a course with narration text that was about 20,000 characters. I used the neural voice to convert my text to speech. My cost was less than $5.00. Keep in mind that in the process of creating this course, I had to encode the text more than just one time, due to the changes that were made as part of an eLearning course development.
During my search for alternative TTS, I also tested the TTS engine available with Google, IBM, and Microsoft. All vendors have their own version of neural voices. Google calls their neural voices WaveNet. Regardless of which vendor you use, they typically provided a Graphical User Interface (GUI) and the Command Line Interface (CLI) to connect to their TTS services. Personally, I prefer the GUI over the CLI.
There are many companies on the Internet that are using the vendor’s provided Application Programming Interface or API to create their own custom interface for connecting to the vendor’s TTS services. In fact, the SL’s Converting Text to Speech tool is an example of a custom interface used to connect to the Amazon Polly TTS services. Some of these companies provide web-based interface while others provide an app that you need to download. I have found that in most of these cases, these companies have found a way to monetize the Amazon Polly and other TTS services by acting as the “middleman” to resell the TTS services.
In my exhaustive search for an alternative to TTS solution for eLearning use, I came across an app, HEROVOICE TTS, which I purchased from the Microsoft Store. This app is different in that it makes it possible for you to connect to three TTS vendors, Amazon, Google, and Microsoft. In this case, the software developer, VABTM Software & Consulting is NOT trying to resell services from any vendor. You’ll need to have your own Amazon Polly, Google Cloud Services, or Microsoft Cognitive Services account to use HEROVOICE TTS. All three vendors offer a free trial account that you can use with this app. The app developer provides instructions and free video tutorials on how to create Amazon, Google, and Microsoft accounts. I like that this app has an easy-to-use GUI. I’m able to connect to Amazon, Google, or Microsoft TTS and access all the features these services offer. One of the features I like best with the HEROVOICE TTS is the ability to incorporate SSML to create high-quality encoded speech without having to know SSML. The program injects the SSML codes into my text, so I don’t even need to manually type in the SSML tags.
Using the HEROVOICE TTS app, I’m also able to take advantage of Lexicons features that are supported by Amazon and Microsoft. The eLearning contents I develop are for a government agency, which like most agencies, tend to have lots of acronyms. The Lexicons feature saves me time by allowing me to direct the TTS engine to pronounce or say acronyms in a specific way. For example, if I have the acronyms POW or MIA in my text, I don’t want the TTS to say the words “pow” or the name Mia. Instead, I can use Lexicons to program the TTS engines to say the acronyms as individual letters, such as P-O-W and M-I-A. I should stress that Amazon and Microsoft provide the capability to use Lexicons. HEROVOICE TTS makes it easy for me to connect to the Lexicons feature offered. Another interesting fact I learned about the HEROVOICE TTS is that the developer created this software specifically to support eLearning developers.
Bottom line is that there are many other TTS solutions available from the Internet. I have found the HEROVOICE TTS to be the most cost-effective solution for me to incorporate TTS services from multiple vendors into my eLearning courses. I can still use the SL TTS solution to create quick demos for testing purposes. Once my narrations are approved by all stakeholders, I can use the HEROVOICE TTS app to create the final high-quality speech from text.
If you’ve taken the time to read this far down, I hope that you will find the information I provided to be helpful. Let me know if you need any assistance in using this software to convert text to speech.