Forum Discussion

AnnaBertoncini's avatar
AnnaBertoncini
Community Member
1 year ago

AI TTS and SSML functionality

Hello everyone,

I would like to bring the attention of the community on AI TTS and the limited use of SSML with it.

I know that it is not supported because you built AI voices "to understand the relationship between words and adjust delivery accordingly". 

However, AI voices mispronounce acronyms and other words like company names and such. 

By saying that I am forced to use the old TTS voices, and it is a bit upsetting because the AI voices sound indeed more natural and "human", big benefits in e-learning for ensuring a more pleasant learning experience for our users. 

This is a request to work on SSML for AI voices because I strongly believe it is needed. 

Anna

12 Replies

  • BrendtWaters's avatar
    BrendtWaters
    Community Member

    Another example: "lead" (as in the element, Pb). Half the time, the TTS says it with a long "e" (as in, "He will lead the parade"). In the regular (non-AI) TTS, spelling it as "led" fixes the issue. But the AI TTS is too smart for its own good, recognizes that "led" doesn't fit the context, and assumes "this must be an acronym (despite the lower case letters)" and reads it as individual letters. A phoneme tag would get around this.

    Fortunately, I found a different workaround. This time.

  • arabellas's avatar
    arabellas
    Community Member

    Seconding Brendt's reply. The phoneme tag is a huge help. It's not about just certain words - my organization uses lots of acronyms that are pronounced as words, but not always in a way that's easy to "fake" with spelling. It's especially frustrating when it's part of a bigger phrase and everything else about the phrase is perfect, but just that one word is totally wrong and you have to regenerate the whole thing. And if you're using two words with unusual pronunciations in the same script, that can be extra frustrating.

    • LucianaPiazza's avatar
      LucianaPiazza
      Staff

      Hi arabellas

      We appreciate you sharing your insight as well! We understand that you'd like more precise control over how specific words are spoken so you can get accurate results without rework. Totally makes sense! We’ve shared your feedback with our product team so they understand your experience. We'll share any future updates in this thread so everyone is aware! 

    • LaurenDuvall's avatar
      LaurenDuvall
      Staff

      Hi BrendtWaters! Thanks for letting us know that this feature would be beneficial for you, too. I know others have shared interest in more control when it comes to pronunciation with AI voices. To ensure I'm sharing the correct information with our Product team, would you mind sharing where you've been stuck with the lack of control with AI voices? Is it certain words, languages, or something else?

      • BrendtWaters's avatar
        BrendtWaters
        Community Member

        By far, the biggest loss is the phoneme tag. Even the smartest of AI voices stumbles on pronunciations. While sometimes, you can "fake" it out to say what you want (e.g., with creative spelling), when emphasis is put on the wrong syllable, there's no way to fix that other than the phoneme tag.

        Case in point: The word "conduct" can be a verb ("Bob will conduct the orchestra") or a noun ("Mary has good conduct"). Same word, two different pronunciations.

        While we were overhauling all our courses (before moving to on-board AI TTS), we were using an external tool (creating mp3s) that allowed the phoneme tag. So that we didn't have to keep figuring out the same IPA over and over, we made a library of phoneme tags. It exceeded 50 tags. So this is no minor loss.

        I'm truly confused why, what is supposed to be an upgrade, *loses* an ability that on-board (regular) TTS had.

  • arabellas's avatar
    arabellas
    Community Member

    I'd like to piggyback on this request. The AI voices overall sound more natural - but it's more challenging to adjust pronunciation. It would also be really nice to be able to adjust the inflection/emphasis and expression. 

  • Hello AnnaBertoncini,

    Thanks for reaching out and sharing your thoughts on AI TTS and SSML functionality.

    You are correct that AI Assistant has limited support for speech synthesis markup language (SSML) because AI-generated voices are designed to understand the relationship between words and adjust delivery accordingly.

    I believe this is a good request, so I shared your feedback with our product team. We'll let you know if there are any changes in the future regarding this area in Articulate AI.

    Enjoy the rest of your day!

     

    • suzarina's avatar
      suzarina
      Community Member

      Any update on this? I'm also having issues trying to get the AI voices to pronounce acronyms correctly, or interject pauses so that the speech sounds more natural. 

      • EricSantos's avatar
        EricSantos
        Staff

        Hi suzarina,

        Thanks for following up on this request.

        There are no updates at the moment, but we'll make sure to post in this thread if there are any developments regarding expanded SSML support in Articulate AI text-to-speech.