Forum Discussion

Paul_Atleos's avatar
Paul_Atleos
Community Member
2 months ago
Solved

AI Voice Generation emphasis in SL

Hi,

Has anybody discovered a way to reliably coax the AI voice generation engine in SL360 to add emphasis to a word or phrase?  For example in written text such as

"read the instructions before starting",

the italics and bold would strongly indicate the importance of reading before starting, and if I was creating my own voice recording I'd heavily lean into the word "before", to stress this.

I haven't yet found a way to do this with the AI VG engine, and you can't add bold or italics to the text dialog.  I've experimented with asterisks etc., but it tends to just garble the output.

I know the whole point of AI is that it is supposed to work stuff like this out for itself through context and should do this automatically, but I do think it sometimes needs some guidance.

Any ideas or tips?

Thanks

Paul

  • Sorry for the quick delay here—I took this back to the team to see if anyone had thoughts/suggestions similar to what you shared about that break time markup. It seems like there's consensus that emphasis in particular is hard to achieve, when I think about this it makes sense because it's not quite a pronunciation difference, I can see why the speech models would have trouble with it! The feeling on the team is there's some experimentation needed to get the voice to flow correctly, and that sometimes experimentation with pronunciation can achieve close to what you want for emphasis. 

    I think you've probably already seen this based on what you referenced, but for anyone else following this thread who may be curious, here is an article the team put together that talks about some of the limitations and options with SSML models and AI speech. 

    Curious to keep following this and see if there are any specific practices folks have landed on that worked really well to achieve emphasis. 

14 Replies

  • Hello Paul,  

    I'm having a related issue with getting the AI voice generator to properly pronounce certain proper names and acronyms.   For example, "ORPS" vs. O-R-P-S. 

    Thanks

    David

    • Paul_Atleos's avatar
      Paul_Atleos
      Community Member

      Hi David,

      I tend to use either phonetic spelling to make it pronounce certain words or use full stops/periods to force it to spell out initialisations.  Sometimes adding speech marks around the word helps as well.  

      So to force it spell out ORPS I might try "O.R.P.S".

      One that always causes issues is "read"... is it current tense or past tense?   If I want it to use the current tense pronounciation I'd spell this "reed".

      In my own field, it seems to have real problems with the phrase "on-us" (meaning a banking transaction with the bank's own customer, as opposed to another bank's customer).  Whether or not it pronounces this as "us" or "U.S" seems to be random, and also affected by the voice you choose. I've sometimes even resorted to spelling it as "on bus" and then editing the output afterwards to cut out the "b" sound!

      Alternatively, sometimes you just have to regenerate the speech over and over until it gets it right.

      It's an exciting technology, but it's not 100% there yet and you have to put work in to tweak it to give you what you want.

      • Noele_Flowers's avatar
        Noele_Flowers
        Staff

        These are all super creative solutions, Paul–thank you for sharing! I got a chuckle out of the "on bus" -> "on-us" workaround 🤣 🚌

    • Thomas_CA's avatar
      Thomas_CA
      Community Member

      The most reliable method for this that I've found is using full stops between the letters.

      Failing that, spelling out the letters is an awkward but usually reliable way of sidestepping this (i.e., "oh or pee es").

  • I have this issue, too! Not sure how to get it to emphasize certain words. 

  • There is a markup language for giving directions to AI voices. It's called "SSML" (Speech Synthesis Markup Language). So you could try marking up your input with the appropriate markup to help the voice pronounce words correctly, add emphasis, etc.

    • Paul_Atleos's avatar
      Paul_Atleos
      Community Member

      Thanks Ray,

      A quick play-around with this didn't yield any result, but the <break> tag I mentioned earlier definitely looks like it fits the syntax of SSML, so I'd say this warrants more investigation.  It looks like some aspects of SSML may be supported by SL's TTS, but not others.

      I'll look into this further and report back if I find something that works.

      • Paul_Atleos's avatar
        Paul_Atleos
        Community Member

        It seems like SSML support in the elevenlabs software is patchy at best.  I found the following article on their website:
        Prompting - ElevenLabs

        It seems as if you can use the <phoneme> tag to force procunciation, and <break time> to insert pauses, but nothing else is mentioned.

        Thanks for the suggestion though Ray

  • When using AI Assistant TTS, the most reliable way I've been able to prompt emphasis of a word within a sentence is to use quotation marks. (Example: Now it's "your" turn to practice what you've just learned.) For me, it works the vast majority of the time.