New Neural Text to Speech Voices from Amazon Polly

Oct 15, 2019

Storyline 360 uses standard voices from Amazon Polly for it's text to speech functionality. I imagine they use their API for this. This has been great for prototyping sides and getting the timing right. Occasionally it can be good for scenarios if you can't afford to hire other voices. Recently, Amazon has developed a new version of their voices called Neural Voices that use better algorithms for synthesizing voices. I would love to see Articulate let us access these new voices directly from Storyline. You can use them by accessing Amazon Polly if you have an Amazon account and downloading the audio and importing it manually into your timeline. I have created a quick demo to let you hear the difference.

https://360.articulate.com/review/content/a6e4ce5a-f982-42aa-8cd1-76e32b2204cd/review

Pinned Reply

Kelly Auner
Staff

5 months ago12/13/23 at 2:36 pm (UTC)

Hi, everyone!

I have some great news to share. We just released another update for Storyline 360. In Update 83, we’ve included important fixes and new features!

One enhanced feature we’ve included:

Unlock new possibilities for text-to-speech audio. Use speech synthesis markup language (SSML) to adjust the speaking rate, modify pronunciation, emphasize words, add pauses, and more.

To take advantage of this update, launch the Articulate 360 desktop app on your computer, and click the Update button next to Storyline 360. You'll find our step-by-step instructions here!

29 Replies

Wendy Farmer
Hero

over 4 years ago10/15/19 at 7:46 pm (UTC)

This is great Rick - add this as a feature request to Articulate and I've just done submitted mine.

Thanks for sharing.

Trina Rimmer
Staff

over 4 years ago10/29/19 at 7:53 pm (UTC)

Hi Rick. This is very cool! Thanks for sharing!

I concur with Wendy: it would be awesome of you to share this with a feature request if you haven't done so already. It just helps our team prioritize features, etc.

Rick Maranta
Author

over 4 years ago10/29/19 at 7:57 pm (UTC)

Yup. No problem. Done.

(BTW, Love the new changes to triggers and actions workflow).

Gerry Wasiluk
Hero

over 4 years ago10/30/19 at 3:52 pm (UTC)

+1.

Anthony Goss

over 4 years ago10/30/19 at 5:12 pm (UTC)

I am currently using the Google Wavenet voices as they are far superior to the Amazon Polly voices. They sound more natural. I use a Google Chrome add-in to play the speech of the text on the screen and record the audio to use inside Storyline. It is definitely more work than the built-in text to speech, but the voices are far superior.

over 4 years ago10/30/19 at 5:53 pm (UTC)

Hi Anthony. Cool. I have not used that. I will have to check that out. The extra work might be worth it.

Thanks!

Rick

Gerry McAteer

over 4 years ago11/27/19 at 5:59 am (UTC)

Russ Sawchuk

over 4 years ago11/29/19 at 9:53 pm (UTC)

We spends thousands of dollars each year on professional narrators for our major elearning projects. Our clients like the results so using professionals is well worth our investment. However, for smaller learning activities, using professional narrators is not practical nor cost-effective.

So I was excited to discover neural voices. Although still not perfect, these voices are now good enough for use in some projects. One of our current projects is targeted at learners for whom English is not their first language. Research we did indicates that these learners prefer to learn by "listening" rather than by "reading. So as a result, I am using neural voices to narrate some of the lessons.

Here is a sample micro-learning lesson that we built using SL3 and neural voices.

Since I use StoryLine 3 (and not 360), the text-to-speech feature is not currently available to us. Fortunately, I was able to find a very affordable cloud-based service that allows me to easily create narration using a variety of neural and other voices. Once the conversion is done, I simply download them for insertion into StoryLine and other programs.

For other StoryLine 3 users who may be interested in trying out this service, you can find out more information here. (I am not a big fan of all of the hype and sales approach at this website / vendor. However, I have subscribed to the service and the bottom line is that is works!)

I hope that this information may be useful with your elearning projects. Thanks.

Russ

UPDATE: I just discovered that I can further improve the quality of the neural voice output by running the audio file through a program called Auphonic Leveler. I have used this software for years with my narrations. It does a great job in "leveling" the audio volume and cleaning up the noise. It does involve an extra step, but I believe it is well worth the effort.

Rick Maranta
Author

over 4 years ago11/29/19 at 10:16 pm (UTC)

Thanks for sharing Russ. Really great. You can also register with your Amazon account for Amazon Polly and they have a page where you can enter in text and download clips the same way for free as well.

Thor Melicher

4 years ago04/22/20 at 8:33 pm (UTC)

Although Neural voices are not supported inside of Storyline yet, I have created an application that can help streamline the process if you use Amazon Polly directly like Rick talks about in his post here.

1.       You’ll need an Amazon Polly account (https://aws.amazon.com/polly/)
2.       Save your scripts to be encoded as separate files (MS Word or Text)
3.       Download HeroVoice TTS from the Microsoft Windows Store
(fully functioning 15-day free trial)
4.       Encode your script files using HeroVoice TTS
5.       Load each audio file into Storyline

HeroVoice TTS supports many of the requests that I’ve seen on the Storyline forums including:

·         Adjusting the overall speed of your files with one setting
·         Adjusting the overall pause duration for commas
·         Adding your own SSML tags to get more finer nuanced, naturally sounding results
·         Neural voices
·         Batch processing your script files

Rick Maranta

4 years ago04/23/20 at 12:00 pm (UTC)

Thor's application is a really useful tool. I've tried it out and if you want to get better quality Neural voices in your project, this is the easiest way to do it!

3 years ago03/04/21 at 2:48 pm (UTC)

Hey Anthony, what add in do you use?

Mitchell Anderson

3 years ago03/06/21 at 7:56 am (UTC)

Some startups are also offering good Neural voices, Murf is one of them with a good selection.

I used their service because my clients like good voice-overs, but don't like the extra time it takes to record a real person and don't want to pay for voice talent

Thor Melicher

3 years ago03/06/21 at 6:14 pm (UTC)

Bob,

WaveNet for Chrome is the Chrome Plugin that Anthony was most likely referring to - you can get it here:

https://chrome.google.com/webstore/search/wavenet%20for%20chrome?hl=en-US

Thor Melicher

3 years ago03/06/21 at 6:43 pm (UTC)

@Mitchell,

Looks like Murf is using a mix of providers? At the very least, their 'Toby-UK' voice is Amazon Polly's 'Brian-UK' voice.

3 years ago03/06/21 at 7:26 pm (UTC)

I use Wellsaid Labs. It is the best I have ever heard as far as computer-generated voices.

Mitchell Anderson

3 years ago03/06/21 at 7:35 pm (UTC)

@Thor, now that you mentioned it, seems like you are right. Though it seems they have many other good voices compared with Amazon Polly. Again might be a mix of providers.

@Anthony yes I tried Wellsaid Labs as well before Murf but the pricing is on a higher side for just 4-15 voice options as against 30+ in English being offer by Murf

3 years ago03/06/21 at 9:54 pm (UTC)

WellSaid certainly is impressive. I would also try Descript. It is significantly cheaper. It has a number of voices included with Overdub and you can create a voice based on your own or someone else's. Not only that you get a complete audio editing toolset and audio transcription service. It's quite impressive. May very well piggy back of the same technology as well said since the voices are of the same quality.

Thor Melicher

3 years ago03/06/21 at 10:04 pm (UTC)

And to throw in another group that's similar to Descript (http://www.descript.com) is Replica Studios - (https://replicastudios.com/)

What's impressive with both companies is that how little is needed to be recorded vs. what Microsoft offered just a few years ago.

Personally, I haven't tried either one yet, but both look promising.

Brooke Ottley

almost 2 years ago05/09/22 at 11:59 pm (UTC)

I've recently secured a subscription to Microsoft Azure's text to speech service. After testing a variety of text-to-speech tools with Australian accents, this is the one we settled on. It has a huge variety of languages and accents, in masculine and feminine voices, and the pronunciation is impeccable. Even better than Google and Amazon's neural voices. I tested its pronunciation of some lengthy medications (currently working on some health-related eLearning videos) and a pharmacist on our team confirmed the TTS got it right, the first time. I've been using it pretty heavily for the last month, and it's only costing us a dollar so far!

You can use IPA and SSML to correct pronunciation, and can even upload a custom lexicon if there's particular words used throughout your transcripts that require correction. E.g. names of local towns, and in our case, Aboriginal nations. Emphasis can kind of be created by using the web-based TTS tool to increase the volume and speed/rate that particular words are spoken. However, for some reason volume changes can only be applied to entire sentences. Here is a screenshot of how I've used the TTS tool to create emphasis on the words "will not". I have then done some basic audio trimming and stitched the sentence together in Storyline.

Our eLearning participants really hate listening to American or British narrators, and they hate the fake, robotic sounding Australian Amazon Polly voice that's built into Storyline. Some have said they would rather mute the entire video and read the captions instead. Us non-American eLearning developers really need an integrated, high quality neural voice within Storyline. But until Articulate make this a priority, the Microsoft Azure voices are a great alternative

Brooke Ottley

almost 2 years ago05/10/22 at 12:02 am (UTC)

This post was removed by the author

Ginger Gregory

1 year ago04/25/23 at 10:06 pm (UTC)

This post was removed by the author

1 year ago04/25/23 at 11:21 pm (UTC)

Hi Thor. How much extra time does it take you to do this do you think rather than using the built in TTS?

1 year ago04/25/23 at 11:52 pm (UTC)

Hi Ginger -

My workflow is a bit different now since I last posted in this thread because I created another HERO app to streamline the process even more. This workflow is based on having scripts in the Notes section of a Storyline course and I can say confidently not only is it faster than Storyline's built in TTS but gets better results:

1) First, I use HEROPREVAIL to extract the scripts into one folder. Each script is automatically labeled with the slide number and title of the slide. I don't have to go slide-by-slide to extract the scripts as it's done automatically. A significant timesaver and stress reducer, too!

2) Next, I use HEROVOICE TTS. I select the provider I want to use (Amazon, Google, or Microsoft - you do have to set up accounts with the providers you want to use), I then select the quality of voice (Neural is the best and not currently offered by Storyline), the language, and then the voice.

3) I select the files I want to encode and let HEROVOICE TTS work in the background. It's very fast as it processes through the list without any intervention on my part.

4) I then manually insert the files back into Storyline. Easy to do since they're all labeled but probably the slowest step in the entire process. No surprise there since the file has to be imported and placed on the timeline. It's about the same time it takes for Storyline to add the Text to Speech file it gets back from Amazon.

Step 3 in this workflow offers flexibility, as you don't have to batch encode. You could do it one file at a time so you could listen and then make adjustments. HEROVOICE TTS supports SSML (Synthesized Speech Markup Language) and provides tag support to make it faster - you don't have to learn another programming language per se as it helps form the tags correctly for you. Very helpful for pauses, slowing down the speed, and correcting those pronunciation problems you've encountered or read about here in the Storyline forums.

FYI - I typically like to batch encode because most of the time it's good enough. If I find a mistake, no worries, it's easy to correct.

I hope this helps and if you have more questions, please feel free to send me a private message or reach out to me on LinkedIn. There are more HERO apps and I'm working on a new one that's going to need testers to make sure it's as good as the ones mentioned here. 🙂