Hello everyone, I would like to bring the attention of the community on AI TTS and the limited use of SSML with it. I know that it is not supported because you built AI voices "to understand the relationship between words and adjust delivery accordingly". However, AI voices mispronounce acronyms and other words like company names and such. By saying that I am forced to use the old TTS voices, and it is a bit upsetting because the AI voices sound indeed more natural and "human", big benefits in e-learning for ensuring a more pleasant learning experience for our users. This is a request to work on SSML for AI voices because I strongly believe it is needed. Anna

By far, the biggest loss is the phoneme tag. Even the smartest of AI voices stumbles on pronunciations. While sometimes, you can "fake" it out to say what you want (e.g., with creative spelling), when emphasis is put on the wrong syllable, there's no way to fix that other than the phoneme tag. Case in point: The word "conduct" can be a verb ("Bob will conduct the orchestra") or a noun ("Mary has good conduct"). Same word, two different pronunciations. While we were overhauling all our courses (before moving to on-board AI TTS), we were using an external tool (creating mp3s) that allowed the phoneme tag. So that we didn't have to keep figuring out the same IPA over and over, we made a library of phoneme tags. It exceeded 50 tags. So this is no minor loss. I'm truly confused why, what is supposed to be an upgrade, *loses* an ability that on-board (regular) TTS had.

Seconding Brendt's reply. The phoneme tag is a huge help. It's not about just certain words - my organization uses lots of acronyms that are pronounced as words, but not always in a way that's easy to "fake" with spelling. It's especially frustrating when it's part of a bigger phrase and everything else about the phrase is perfect, but just that one word is totally wrong and you have to regenerate the whole thing. And if you're using two words with unusual pronunciations in the same script, that can be extra frustrating.

Adding another vote for this. I have to spend silly amounts of time regenerating ai voices to counter the way it tries to pronounce certain words - mainly ones that have both a noun and a verb that are pronounced differently (e.g. record). We also work in healthcare, therefore a number of words or abbreviations need certain pronunciations. Also, some words are pronounced the American way even with British voices. We have figured many of them out in terms of writing them strangely, but even then longer text often needs multiple efforts to get it right. I've only discovered SSML today but can already tell this would be a godsend. It's hard for AI to workout the context of words when, for example, we produce software simulations, so verbs can be used as screen headings (e.g. the Record Referral screen). AI isn't very good at understanding that is a verb. We need more control.

Hello AnnaBertoncini, Thanks for reaching out and sharing your thoughts on AI TTS and SSML functionality. You are correct that AI Assistant has limited support for speech synthesis markup language (SSML) because AI-generated voices are designed to understand the relationship between words and adjust delivery accordingly. I believe this is a good request, so I shared your feedback with our product team. We'll let you know if there are any changes in the future regarding this area in Articulate AI. Enjoy the rest of your day!

Any update on this? I'm also having issues trying to get the AI voices to pronounce acronyms correctly, or interject pauses so that the speech sounds more natural.

AI TTS and SSML functionality | E-Learning Heroes

17 Replies

EricSantos
Staff
1 year ago
Hello AnnaBertoncini,

Thanks for reaching out and sharing your thoughts on AI TTS and SSML functionality.

You are correct that AI Assistant has limited support for speech synthesis markup language (SSML) because AI-generated voices are designed to understand the relationship between words and adjust delivery accordingly.

I believe this is a good request, so I shared your feedback with our product team. We'll let you know if there are any changes in the future regarding this area in Articulate AI.

Enjoy the rest of your day!
- suzarina
  Community Member
  10 months ago
  Any update on this? I'm also having issues trying to get the AI voices to pronounce acronyms correctly, or interject pauses so that the speech sounds more natural.
  - EricSantos
    Staff
    10 months ago
    Hi suzarina,
    
    Thanks for following up on this request.
    
    There are no updates at the moment, but we'll make sure to post in this thread if there are any developments regarding expanded SSML support in Articulate AI text-to-speech.
arabellas
Community Member
4 months ago
I'd like to piggyback on this request. The AI voices overall sound more natural - but it's more challenging to adjust pronunciation. It would also be really nice to be able to adjust the inflection/emphasis and expression.
BrendtWaters
Community Member
27 days ago
Another vote for this.
- LaurenDuvall
  Staff
  25 days ago
  Hi BrendtWaters! Thanks for letting us know that this feature would be beneficial for you, too. I know others have shared interest in more control when it comes to pronunciation with AI voices. To ensure I'm sharing the correct information with our Product team, would you mind sharing where you've been stuck with the lack of control with AI voices? Is it certain words, languages, or something else?
  - BrendtWaters
    Community Member
    25 days ago
    By far, the biggest loss is the phoneme tag. Even the smartest of AI voices stumbles on pronunciations. While sometimes, you can "fake" it out to say what you want (e.g., with creative spelling), when emphasis is put on the wrong syllable, there's no way to fix that other than the phoneme tag.
    
    Case in point: The word "conduct" can be a verb ("Bob will conduct the orchestra") or a noun ("Mary has good conduct"). Same word, two different pronunciations.
    
    While we were overhauling all our courses (before moving to on-board AI TTS), we were using an external tool (creating mp3s) that allowed the phoneme tag. So that we didn't have to keep figuring out the same IPA over and over, we made a library of phoneme tags. It exceeded 50 tags. So this is no minor loss.
    
    I'm truly confused why, what is supposed to be an upgrade, *loses* an ability that on-board (regular) TTS had.
arabellas
Community Member
23 days ago
Seconding Brendt's reply. The phoneme tag is a huge help. It's not about just certain words - my organization uses lots of acronyms that are pronounced as words, but not always in a way that's easy to "fake" with spelling. It's especially frustrating when it's part of a bigger phrase and everything else about the phrase is perfect, but just that one word is totally wrong and you have to regenerate the whole thing. And if you're using two words with unusual pronunciations in the same script, that can be extra frustrating.
- LucianaPiazza
  Staff
  21 days ago
  Hi arabellas,
  
  We appreciate you sharing your insight as well! We understand that you'd like more precise control over how specific words are spoken so you can get accurate results without rework. Totally makes sense! We’ve shared your feedback with our product team so they understand your experience. We'll share any future updates in this thread so everyone is aware!
BrendtWaters
Community Member
18 days ago
Another example: "lead" (as in the element, Pb). Half the time, the TTS says it with a long "e" (as in, "He will lead the parade"). In the regular (non-AI) TTS, spelling it as "led" fixes the issue. But the AI TTS is too smart for its own good, recognizes that "led" doesn't fit the context, and assumes "this must be an acronym (despite the lower case letters)" and reads it as individual letters. A phoneme tag would get around this.

Fortunately, I found a different workaround. This time.
- LucianaPiazza
  Staff
  14 days ago
  Appreciate you sharing another example with us, BrendtWaters. I've passed this along to our Product Team for awareness. We'll be sure to share any updates in this thread.
PeterGrennan
Community Member
14 days ago
Adding another vote for this. I have to spend silly amounts of time regenerating ai voices to counter the way it tries to pronounce certain words - mainly ones that have both a noun and a verb that are pronounced differently (e.g. record). We also work in healthcare, therefore a number of words or abbreviations need certain pronunciations. Also, some words are pronounced the American way even with British voices. We have figured many of them out in terms of writing them strangely, but even then longer text often needs multiple efforts to get it right. I've only discovered SSML today but can already tell this would be a godsend. It's hard for AI to workout the context of words when, for example, we produce software simulations, so verbs can be used as screen headings (e.g. the Record Referral screen). AI isn't very good at understanding that is a verb. We need more control.
- BrendtWaters
  Community Member
  13 days ago
  Yeah, I've run into a similar thing as Peter with titles/headings. They're usually just a phrase, so the AI just doesn't have enough context info.
arabellas
Community Member
11 days ago
Vyond has recently added a "pronunciation library" feature - this would be a great alternative solution for storyline to add! Here's the link - Pronunciation for Text to Speech – Help Center
EricSantos
Staff
11 days ago
Hi PeterGrennan, BrendtWaters, and arabellas,

You’re right to call this out, especially with the examples you’ve all shared around pronunciation and context.

I appreciate you checking in on this as well. While I don’t have a specific update to share right now, this is still something we’re actively tracking.

At the moment, AI text-to-speech is designed to infer pronunciation based on context, but it doesn’t offer the same level of control as traditional TTS when it comes to fine-tuning output. That’s where cases like acronyms, industry-specific terms, or words with multiple pronunciations can become challenging, especially when context is limited, like in headings or short phrases.

The examples you’ve all provided, from healthcare terminology to words like “record” and “lead,” are really helpful in highlighting where more control is needed. I can also see how features like phoneme support or a pronunciation library would make a big difference in reducing rework and improving consistency.

I’ve added your feedback to the existing request, including these newer use cases. We’re continuing to gather input as the team explores ways to improve pronunciation control in AI voices.

Forum Discussion

AI TTS and SSML functionality

17 Replies

Related Content

Adding SSML codes, can't update TTS

Pronunciation in AI Audio (TTS)

Insert SSML tags

Chinese TTS voices

SSML Guidance

Learn

Community Blog

Connect

Community

Company

Trust Center