Storyline 360: Enrich Audio Narrations with Classic or AI-Generated Text-to-Speech

Course authors have long relied on the classic text-to-speech feature in Storyline 360 to create quick audio narration for their e-learning content and speed up course development. However, even with the neural voice options that have been added to the standard ones, the voices in the classic text-to-speech feature can sound robotic, making for a less natural and engaging learner experience. Now, AI-generated text-to-speech is changing the game. The newest addition to your authoring toolkit, AI Assistant’s text-to-speech gives you access to incredibly lifelike, AI-generated voices that are hard to distinguish from a real human voice.

So will you keep using the classic version or embrace the brave new world of AI text-to-speech technology? Let’s take a look at the pros and cons of each to help you decide which option to choose for your next project.

Classic Text-to-Speech

Storyline’s classic text-to-speech has evolved significantly over the years. In particular, the introduction of neural voices empowered authors to create more realistic and natural-sounding narrations. Here’s an overview of when classic text-to-speech may be the best choice to elevate your audio content.

You need certain languages. Classic text-to-speech allows you to create narrations for diverse audiences with support for multiple languages in standard and neural voice. In addition, some of these languages—including Icelandic, Welsh, Catalan, and Irish—are only currently available in classic text-to-speech.
You need full SSML support. Unlike AI-generated text-to-speech, classic text-to-speech offers full speech synthesis markup language (SSML) support. This allows you to fine-tune narrations by adjusting the speaking rate, modifying pronunciation, adding pauses, and more to boost clarity and interest.
You have reservations about using AI. Not everyone is ready to embrace new technologies like generative AI, and even some organizations have restrictions on using AI-powered tools.

Outside of these specific circumstances, however, the overall quality of classic text-to-speech voices may not be sufficiently natural, especially for more complex or nuanced content. Want to judge for yourself? You can hear classic text-to-speech in action by playing the narrations below.

Standard Voice

Neural Voice

AI-Generated Text-to-Speech

AI Assistant’s text-to-speech feature takes voice narration to the next level, using generative AI technology to create highly realistic voices. You can customize the voices to fit your content needs, making the experience feel more personal and engaging to your learners. Here’s an overview of when AI text-to-speech may be the best option for bringing your narrations to life.

You need certain (other) languages. AI Assistant allows you to broaden your reach with support for up to 32 languages, depending on the model used to generate narration—including some with multiple accents and dialects. The table below lists 11 languages you can only find in AI text-to-speech.

Bulgarian
Croatian
Filipino
Greek

Hindi
Hungarian
Indonesian
Malay

Slovak
Ukrainian
Vietnamese

You want an easier process. AI Assistant’s intuitive interface helps you quickly generate narration in any supported language. Simply select a voice and enter a script—AI Assistant handles the rest. Even though the voice description may note a specific accent or language, AI Assistant still generates narration in the language used in your script.
You need highly customizable voices to create a personalized audio experience. AI Assistant lets you control everything, from adjusting the balance between steadiness and randomness to determining how closely the AI should adhere to the original voice when attempting to replicate it. Click here to learn how to customize AI-generated voices.
You want to impress your learners with lifelike, context-aware voices. AI Assistant’s text-to-speech adapts to the tone, emotion, and nuances of your content or script. Here’s an example of a text-to-speech narration created using an AI-generated voice.

AI-generated Voice

You need voices tailored to specific training needs. AI Assistant comes with a voice library that offers thousands of ultrarealistic, AI-generated voices that can be filtered by age, gender, and use case.

That said, AI text-to-speech has its own drawbacks. For example, because the underlying models don’t support SSML phoneme tags, AI text-to-speech has limited SSML options, as mentioned above. If you have special terminology or pronunciation, indicating that can be harder without full SSML support. And while AI text-to-speech does support the break tag <break time=“1.5s” /> if you want to manually control pacing, note that an excessive number of break tags can potentially cause instability.

In addition, AI text-to-speech offers a huge variety of options but no specific guidance on which voices work best for a given language. Finding just the right voice can require a lot of experimentation—that may be time you don’t have.

Pro tip: Keeping a reference list of voices that work for specific languages in your courses can help with your next project.

Check out these user guides for step-by-step instructions on creating AI-generated text-to-speech audio, including in Rise 360.

Choose What Works For You

AI-generated voices clearly have the edge over classic text-to-speech options on voice quality. However, if you (or your organization) are still on the fence about adopting generative AI in your content creation process or have specialized needs, classic text-to-speech is still there to help you create engaging audio interactions. You get to decide what sounds right for your learners—and for your own content and workflow.

Updated 6 months ago

Version 3.0