Forum Discussion
AI Voice Generation emphasis in SL
Hi,
Has anybody discovered a way to reliably coax the AI voice generation engine in SL360 to add emphasis to a word or phrase? For example in written text such as
"read the instructions before starting",
the italics and bold would strongly indicate the importance of reading before starting, and if I was creating my own voice recording I'd heavily lean into the word "before", to stress this.
I haven't yet found a way to do this with the AI VG engine, and you can't add bold or italics to the text dialog. I've experimented with asterisks etc., but it tends to just garble the output.
I know the whole point of AI is that it is supposed to work stuff like this out for itself through context and should do this automatically, but I do think it sometimes needs some guidance.
Any ideas or tips?
Thanks
Paul
Sorry for the quick delay here—I took this back to the team to see if anyone had thoughts/suggestions similar to what you shared about that break time markup. It seems like there's consensus that emphasis in particular is hard to achieve, when I think about this it makes sense because it's not quite a pronunciation difference, I can see why the speech models would have trouble with it! The feeling on the team is there's some experimentation needed to get the voice to flow correctly, and that sometimes experimentation with pronunciation can achieve close to what you want for emphasis.
I think you've probably already seen this based on what you referenced, but for anyone else following this thread who may be curious, here is an article the team put together that talks about some of the limitations and options with SSML models and AI speech.
Curious to keep following this and see if there are any specific practices folks have landed on that worked really well to achieve emphasis.
14 Replies
- DavidStringerCommunity Member
Hello Paul,
I'm having a related issue with getting the AI voice generator to properly pronounce certain proper names and acronyms. For example, "ORPS" vs. O-R-P-S.
Thanks
David
- Paul_AtleosCommunity Member
Hi David,
I tend to use either phonetic spelling to make it pronounce certain words or use full stops/periods to force it to spell out initialisations. Sometimes adding speech marks around the word helps as well.
So to force it spell out ORPS I might try "O.R.P.S".
One that always causes issues is "read"... is it current tense or past tense? If I want it to use the current tense pronounciation I'd spell this "reed".
In my own field, it seems to have real problems with the phrase "on-us" (meaning a banking transaction with the bank's own customer, as opposed to another bank's customer). Whether or not it pronounces this as "us" or "U.S" seems to be random, and also affected by the voice you choose. I've sometimes even resorted to spelling it as "on bus" and then editing the output afterwards to cut out the "b" sound!
Alternatively, sometimes you just have to regenerate the speech over and over until it gets it right.It's an exciting technology, but it's not 100% there yet and you have to put work in to tweak it to give you what you want.
These are all super creative solutions, Paul–thank you for sharing! I got a chuckle out of the "on bus" -> "on-us" workaround 🤣 🚌
- Thomas_CACommunity Member
The most reliable method for this that I've found is using full stops between the letters.
Failing that, spelling out the letters is an awkward but usually reliable way of sidestepping this (i.e., "oh or pee es").
- JulieBomberryCommunity Member
I have this issue, too! Not sure how to get it to emphasize certain words.
- RayCole-2d64185Community Member
There is a markup language for giving directions to AI voices. It's called "SSML" (Speech Synthesis Markup Language). So you could try marking up your input with the appropriate markup to help the voice pronounce words correctly, add emphasis, etc.
- Paul_AtleosCommunity Member
Thanks Ray,
A quick play-around with this didn't yield any result, but the <break> tag I mentioned earlier definitely looks like it fits the syntax of SSML, so I'd say this warrants more investigation. It looks like some aspects of SSML may be supported by SL's TTS, but not others.
I'll look into this further and report back if I find something that works.
- Paul_AtleosCommunity Member
It seems like SSML support in the elevenlabs software is patchy at best. I found the following article on their website:
Prompting - ElevenLabsIt seems as if you can use the <phoneme> tag to force procunciation, and <break time> to insert pauses, but nothing else is mentioned.
Thanks for the suggestion though Ray
- SueAntonissen-2Community Member
When using AI Assistant TTS, the most reliable way I've been able to prompt emphasis of a word within a sentence is to use quotation marks. (Example: Now it's "your" turn to practice what you've just learned.) For me, it works the vast majority of the time.
- Paul_AtleosCommunity Member
Thanks Sue, I'll try that.
Related Content
- 2 years ago
- 3 months ago