Alternative Text-to-Speech (TTS) Solutions for eLearning

Jun 05, 2021

Hello everyone,

This is a long post about alternative Text-to-Speech solutions. When Articulate introduced the "Converting Text to Speech" feature, I thought it was a great tool. However, after using it a bit, like most of you, I realized that would take much more effort to create speeches that are more natural sounding.

Bottomline Upfront: I use HEROVOICE TTS app to connect to Amazon Polly, Google Cloud, and Microsoft Cognitive Services TTS services. If you would like more details, read more of my comments below. If you want to try using HEROVOICE TTS, you can download it at https://www.microsoft.com/store/apps/9P7Q072TRWMJ.

In my search to find an alternative text-to-speech (TTS) method, I discovered that Amazon Polly is the TTS engine used by A360 to provide the TTS conversion tool in Storyline. Personally, I think Articulate did a good job with implementing Amazon Polly TTS into SL. The tool is very easy to use, very convenient and doesn’t cost anything. However, like many things that are free, there are limitations. In this case, Articulate did not offer all the capabilities that are available with Amazon Polly. For example, neural voices and specific voice styles such as conversational and newscaster. Amazon Polly also supports Speech Synthesis Markup Language (SSML) standards that can be used to markup the text to achieve human-like speech quality. This feature and many others are not offered with the free TTS built into SL.

To illustrate the differences between the standard Amazon Polly voices offered in SL and the neural voices, and added conversation voice style, offered by the Amazon Polly TTS engine, I’ve attached three mp3 audio files. The text I used is the first paragraph of this post. In all three cases, I selected the Matthew voice to record with. In the first example, I used the standard voice, which is what is available in SL. In the second example, I used the neural voice version of Matthew. In the third example, I applied the Amazon custom voice style, Conversational to the Matthew neural voice. In all three cases, I did not use any other supported SSML to modify the speech. I think you’ll readily agree that the neural voice is better than the standard voice and that the conversational style neural voice is the best sounding.

To be fair to Articulate, the TTS tool offered is an excellent TTS tool for testing purposes. However, if you want a higher speech quality, you must be willing to pay for it. In this case, the cost of using Amazon Polly is very affordable. I recently completed a course with narration text that was about 20,000 characters. I used the neural voice to convert my text to speech. My cost was less than $5.00. Keep in mind that in the process of creating this course, I had to encode the text more than just one time, due to the changes that were made as part of an eLearning course development.

During my search for alternative TTS, I also tested the TTS engine available with Google, IBM, and Microsoft. All vendors have their own version of neural voices. Google calls their neural voices WaveNet. Regardless of which vendor you use, they typically provided a Graphical User Interface (GUI) and the Command Line Interface (CLI) to connect to their TTS services. Personally, I prefer the GUI over the CLI.

There are many companies on the Internet that are using the vendor’s provided Application Programming Interface or API to create their own custom interface for connecting to the vendor’s TTS services. In fact, the SL’s Converting Text to Speech tool is an example of a custom interface used to connect to the Amazon Polly TTS services. Some of these companies provide web-based interface while others provide an app that you need to download. I have found that in most of these cases, these companies have found a way to monetize the Amazon Polly and other TTS services by acting as the “middleman” to resell the TTS services.

In my exhaustive search for an alternative to TTS solution for eLearning use, I came across an app, HEROVOICE TTS, which I purchased from the Microsoft Store. This app is different in that it makes it possible for you to connect to three TTS vendors, Amazon, Google, and Microsoft. In this case, the software developer, VABTM Software & Consulting is NOT trying to resell services from any vendor. You’ll need to have your own Amazon Polly, Google Cloud Services, or Microsoft Cognitive Services account to use HEROVOICE TTS. All three vendors offer a free trial account that you can use with this app. The app developer provides instructions and free video tutorials on how to create Amazon, Google, and Microsoft accounts. I like that this app has an easy-to-use GUI. I’m able to connect to Amazon, Google, or Microsoft TTS and access all the features these services offer. One of the features I like best with the HEROVOICE TTS is the ability to incorporate SSML to create high-quality encoded speech without having to know SSML. The program injects the SSML codes into my text, so I don’t even need to manually type in the SSML tags.

Using the HEROVOICE TTS app, I’m also able to take advantage of Lexicons features that are supported by Amazon and Microsoft. The eLearning contents I develop are for a government agency, which like most agencies, tend to have lots of acronyms. The Lexicons feature saves me time by allowing me to direct the TTS engine to pronounce or say acronyms in a specific way. For example, if I have the acronyms POW or MIA in my text, I don’t want the TTS to say the words “pow” or the name Mia. Instead, I can use Lexicons to program the TTS engines to say the acronyms as individual letters, such as P-O-W and M-I-A. I should stress that Amazon and Microsoft provide the capability to use Lexicons. HEROVOICE TTS makes it easy for me to connect to the Lexicons feature offered. Another interesting fact I learned about the HEROVOICE TTS is that the developer created this software specifically to support eLearning developers.

Bottom line is that there are many other TTS solutions available from the Internet. I have found the HEROVOICE TTS to be the most cost-effective solution for me to incorporate TTS services from multiple vendors into my eLearning courses. I can still use the SL TTS solution to create quick demos for testing purposes. Once my narrations are approved by all stakeholders, I can use the HEROVOICE TTS app to create the final high-quality speech from text.

If you’ve taken the time to read this far down, I hope that you will find the information I provided to be helpful. Let me know if you need any assistance in using this software to convert text to speech.

 Dan

Pinned Reply
Kelly Auner

Hi, everyone!

I have some great news to share. We just released another update for Storyline 360. In Update 83, we’ve included important fixes and new features!

One enhanced feature we’ve included:

Unlock new possibilities for text-to-speech audio. Use speech synthesis markup language (SSML) to adjust the speaking rate, modify pronunciation, emphasize words, add pauses, and more.

To take advantage of this update, launch the Articulate 360 desktop app on your computer, and click the Update button next to Storyline 360. You'll find our step-by-step instructions here!

14 Replies
Michael Marcos

Hi Mark, Daniel, 

While SSML Support is not yet available in Storyline 360, I just wanted to chime in here to let you know that Update 80 is now using Amazon Polly's Neural Voices and you can get that simply by updating to the most recent version of Storyline 360

All the best, 
Michael Marcos
Customer Support Product Liaison

Khushi Singh

Hey Dan,

I have a question.

For those who have used AI text to speech, what differences have you noticed in terms of naturalness and ease of integration with eLearning content? Also, considering the anchor text 'AI text to speech,' are there specific AI-driven TTS tools that stood out for you in enhancing the overall quality of narration in your eLearning projects?

Daniel Bolia

Hi Khushi,

Regarding the AI text-to-speech site, you referenced and many others like it, they are basically packaging the AI text-to-speech technologies provided by Amazon, Microsoft, Google, or other vendor into a commercial product. Personally, I don’t see any reason to pay for the services these sites provide when Articulate provides the text-to-speech functionality for free.

Free is good, but there are limitations when using Amazon Polly via the Articulate interface. I have experimented and used many other options to handle text-to-speech and have gotten much better results from the stock/neural Amazon Polly. One of my favorites and go to service is HEROVOICE TTS because it supports multiple AI text-to-speech providers such as Amazon, MS, Google, and Eleven Labs. The app allows me to pick the AI text-to-speech engine I have an account with and want to use. Since my work pays for my MS account, I’ve used their AI text-to-speech engine quite a bit in the past. If you’re like me and must support an organization that has tons of acronyms, being able to use SSML is a godsend. However, with my most recent projects, I used the Eleven Labs engine (also supported by HEROVOICE TTS), which I find to have many AI voices that are excellent and natural. Although Eleven Labs currently does not support SSML, I found that their AI engine can recognize when a word should be pronounced as individual letters instead of a word. For example, when I use the acronym DOE, Eleven Labs knows to pronounce this as D-O-E instead of doe (as in “doe, a deer, a female deer”).

The HeroVoice TTS app has built-in functions for novice and advanced users. The fact that HEROVOICE TTS was developed by an e-learning developer to address the limitations of Articulate Storyline is like having icing and a cherry with your dessert. The developer is actively engaged and is quick to add new features to the suite of HERO apps. If you want to take the time to check out HEROVOICE TTS, your time will be rewarded since you’ll learn about other “HERO” related apps that I have found useful for my eLearning development work. If you’re interested and want to take your AI voice generation to the Thor level, check out HEROVOICE TTS app and download it from their MS Store or check out their YouTube channel.

Good luck with your projects,

dan

Kelly Auner

Hi, everyone!

I have some great news to share. We just released another update for Storyline 360. In Update 83, we’ve included important fixes and new features!

One enhanced feature we’ve included:

Unlock new possibilities for text-to-speech audio. Use speech synthesis markup language (SSML) to adjust the speaking rate, modify pronunciation, emphasize words, add pauses, and more.

To take advantage of this update, launch the Articulate 360 desktop app on your computer, and click the Update button next to Storyline 360. You'll find our step-by-step instructions here!

Paula Schmidt

I'm trying to use the SSML tags to have the US English, neural Danielle voice say the acronym SOGI (sojee) correctly (always says it with a hard G as So-Gee), but I can't seem to get the SSML tags correct.  I've tried using the <say-as> SSML tag, but still keep getting the "Storyline can't convert ..." error message.  I've tried it in 32 bit and 64 bit.  

This is my best guess for using the <say-as> tag,

<speak>
Testing the say as SSML feature in Storyline for saying <say-as="sojee">SOGI</say-as>
</speak>

Format of SSML Say-As tag in SL

Going by the coloring of the string, it appears to be correct, but no go.  I've tried it with no space between the word "saying" and the <say-as> tag, I've tried it using a space between "saying" and the <say-as> tag, I've tried it without the acronym SOGI in the string, I've tried putting the acronym SOGI before the <say-as> tag, but nothing seems to work.

Need additional guidance as to formatting SSML tags in SL.

Thor Melicher

Hello Paula,

The <say-as> tag would need a bit more information to determine what it's supposed to do. The best page for the additional attributes can be found here but I don't think it will fit your particular need as it's more literal (characters, how to read a date, etc.):
Supported SSML Tags - Say As

What you're looking for is phonetic pronunciation: Supported SSML Tags - Phonetics

That can be a bit confusing if you haven't done it before (ChatGPT and other AI clients to the rescue!) so there is a 3rd alternative and that's to spell it out yourself.

Try your alternative spelling as above to see if that works.

Thor Melicher

I can totally appreciate that! Amazon Polly supports another alternative called Lexicons but you would need to:

1. Set up an account with Amazon Web Services (AWS)

2. Find an alternative for captioning

Probably a bit much in this case but if you have several acronyms/words and frequency of use, let me know and I can share with you privately what I and others do while gaining access to several more text-to-speech options.