Forum Discussion
Using TTS to pronounce English words while speaking another language.
One challenge we are having is that when generating audio tracks based on the localized transcripts (using TTS), the narrator voices are trying to pronounce English words as if they were in the local language. So for example, even though the script says “Measurement Solutions”, the Danish voice is pronouncing it like “More sour mind pig lution” (context, below).
I know that I could try inserting some silence in the Danish audio, and and adding as second audio track that uses an English TTS voice for those 1-3 words, but then we would be using 2 different TTS voices in one sentence.
Any other hacks/ideas/tricks?
For context, here's an example:
Danish script:
Välkommen till Measurement Solutions: Vår kvalitetskultur. Syftet med denna utbildning är att ge en förståelse för varför kvalitet är så viktigt för företaget och varför det är viktigt för alla anställda. Vi kommer visa hur du kan bidra till att vi utvecklar och upprätthåller en stark kvalitetskultur på alla Measurement Solutions anläggningar.
1 Reply
Hi ChrisPetrizz906!
Happy to share some insight on this!
When Articulate Localization is used to translate a Storyline course, with Text-to-Speech applied afterward, Storyline automatically uses a localized TTS engine (e.g. Danish). The voice synthesizer assumes all text in the narration field is in that target language. So when it encounters English words like "Measurement Solutions", it tries to pronounce them using Danish phonetics.
This is actually a limitation of the TTS provider itself, rather than Storyline. The engine cannot dynamically detect language per word. As a workaround, you may be able to use phonetic spelling to "trick" the Danish voice by rewriting English words phonetically as they would sound in Danish For example:
"Mesjermænt Soluushns"
Another option is to generate recurring English terms separately, then export and reuse them throughout the project. For terms like “Measurement Solutions,” create a single high-quality English TTS clip once and reuse it across slides/scenes when necessary. This can help ensure consistency and save some time.
I'll also open the floor to our community to share alternative design suggestions with you!