Word Emphasis in Text to Speech

May 21, 2019

Is there a way to emphasize a particular word in the Text to Speech function?

For example, "This may not seem like a problem..."

In that example, is there a way to emphasize the word, "seem" in Text to Speech?

20 Replies
Leslie McKerchie

Hi Christopher,

Thanks for reaching out and letting us know what you would like to use in the text to speech tool. This is not available at this time, but we are tracking a feature request for some additional functionality, such as speed control and emphasis.

I've added this conversation to the report as we track user impact and so that we can share any updates with you here.

I wanted to share some information about how we manage these feature requests as that may be helpful.

Thor Melicher

I recently created an application that might address the need of adjusting the speed for Text to Speech in Storyline and the ability to add emphasis to words by using SSML tags.  It’s a bit of a workaround though as you’ll have to go to the source that Storyline uses, Amazon Polly voices.  

Here’s what you do:

  1. Get an Amazon Polly account (yes, there is some cost involved but doesn’t seem that prohibitive) (https://aws.amazon.com/polly/)
  2. Save your scripts as separate files (MS-Word or Text)
  3. Download HeroVoice TTS from the Microsoft Windows Store (fully functioning 15-day free trial)
  4. Encode your files with HeroVoice TTS – apply a global setting for speed and even comma duration so your files are consistent.
  5. Select the voice you want – these are the same as you’ll find in Storyline today including Neural voices (which aren’t currently available in Storyline)
  6. Load each audio file into Storyline 

The supported SSML tags in Amazon Polly is a good resource as it will let you know which tags are supported for each voice type (Standard or Neural).  The Neural voices are a higher quality voice and *don't* support the SSML emphasis tag.

Mike Adamski

co-signing this request as it would be very nice to add emphasis by placing * * around a word or something like that. I imagine emphasis could just be a slight tweak to the pitch with a short delay.

I've used Adobe Premiere Pro for most of my videos, but create the audio from text within Storyline and then export it. It'd be nice to adjust speed and pitch and also do both independently from one another.

Brooke Ottley

+1 for this feature request. The ability to change the emphasis would be especially useful for words where the syllable stress/emphasis can actually alter the meaning of the word. E.g. "record" (noun) vs "record" (verb).

Also, as an Australian user, the American voices are unfortunately much more natural sounding than the Australian voices currently available in Storyline via Amazon Polly. Please incorporate the latest Neural Polly voices into Storyline for us non-American English-speaking users - and for our clientele, who find the American and UK voices annoying and sometimes difficult to understand.

Brooke Ottley

I've recently secured a subscription to Microsoft Azure's text to speech tool. After testing a variety of text-to-speech tools with Australian accents, this is the one we settled on. It has a huge variety of languages and accents, in masculine and feminine voices, and the pronunciation is impeccable. Even better than Google and Amazon's neural voices. I tested its pronunciation of some lengthy medications (currently working on some health-related eLearning videos) and a pharmacist on our team confirmed the TTS got it right, the first time. And the pricing... well let's just say I've been using it pretty heavily for the last month, and it's only costing us $1 so far!

You can use IPA and SSML to correct pronunciation, and can even upload a custom lexicon if there's particular words used throughout your transcripts that require correction. E.g. names of local towns, and in our case, Aboriginal nations. Emphasis can kind of be created by using the web-based TTS tool to increase the volume and speed/rate that particular words are spoken. However, for some reason volume changes can only be applied to entire sentences. Here is a screenshot of how I've used the TTS tool to create emphasis on the words "will not". I have then done some basic audio trimming and stitched the sentence together in Storyline.

Our eLearning participants really hate listening to American or British narrators, and they hate the fake, robotic sounding Australian Amazon Polly voice that's built into Storyline. Some have said they would rather mute the entire video and read the captions instead. Us non-American eLearning developers really need an integrated, high quality neural voice within Storyline. But until Articulate make this a priority, the Microsoft Azure voices are a great alternative.

Michael Marcos

Hi Brooke, Sheri and Tina, 

I just wanted to share some news about Storyline 360 Update 80. This update might be interesting for you since you’ve explored other options to improve the quality of TTS. In update 80, we have taken advantage of Amazon Polly’s neural text to speech feature. You will see better versions of most of the voices when inserting TTS audio. These will show up in the same place as the standard voices, under “Neural Voices”. These are voices that sound more natural and human-like and are considerably higher in quality compared to the older standard TTS voices.

A list of these voices can be found  here. Updating Storyline 360 to the latest version is super easy, here is the guide in case anyone needs it. 

We will continue to keep tabs on requests to support ways to control emphasis, speaking rate, inserting silence, pronunciation and SSML support in general. Please let me know of any feedback (good or bad), around this enhancement and we’ll be happy to pass it along to our dedicated team of engineers. 

All the best,
Michael Marcos
Customer Support Product Liaison

Brooke Ottley

Thank you Michael. Appreciate your efforts to integrate at least one neural TTS product in Storyline. I have tested the one Australian voice now available in Storyline, and while it is significantly better than the robotic sounding standard voices, it doesn't quite meet our business needs. Azure TTS voices are much more natural and allow pronunciation to be corrected using the IPA, not just SSML. Here is a demo video I created, showing the use case for our particular business needs. I will of course recommend the built-in neural voice for users within our business that have very short production timelines, but for everyone else, I expect they will prefer the Azure voices.

Michael Marcos

Thank you Brooke for the video, something to think about for sure! 

And hello everyone, just wanted to share with y'all that Update 83 now allows for SSML support in TTS. 

Unlock new possibilities for text-to-speech audio. Use speech synthesis markup language (SSML) to adjust the speaking rate, modify pronunciation, emphasize words, add pauses, and more.

To take advantage of this update, launch the Articulate 360 desktop app on your computer, and click the Update button next to Storyline 360. You'll find our step-by-step instructions here!

Best,
Mike