Forum Discussion
Week 1 Discussion
1. What surprised you about how AI handled text or media generation?
I was quite pleased with the AI image generation of the character "Saanvi" the AI feature instinctively created an East-Indian looking female (based on the name: Saanvi). I thought I would have to redo the prompt with specific instructions on the ethnicity of the character once the images were generated. Turns out, I didn't need to refine the prompt.
2. What challenges did you face when trying to get AI to produce the results you wanted?
I've run into this several times when using the AI TTS feature. Some iterations have the "voice" correctly pronounce words; but then, if I need to change phrasing or add more context, the generative voice will now mis-pronounce the word that was previously pronounced correctly! Also, when using acronyms, in some instances the voice will pronounce the acronym as a word (which is what I want), other times it will read out each individual letter. I came across one instance (in my own project) where I needed the AI generated voice to read "MIPS" as a word. I tried several different methods to have it pronounced as "mips", but it just would not comply! Are there any hints, tips etc. for fine tuning pronunciations? Is there a way to copy/paste phonetic spellings into the text box to guide the pronunciation?
However, the CC generation capabilities are fantastic! So much time is saved by not having to import a separate .srt, .vtt etc. file to generate CC!
I've experienced the same frustration with AI TTS pronunciation. The same AI voice will pronounce a word perfectly in one instance then mispronounce the same word in another instance by putting the emphasis on the wrong syllable or using an incorrect vowel sound. I did experiment with using phonetic spelling or replacing acronyms with words (e.g., "SQL" as "sequel"), but the results aren't consistent.
I'm also interested in hearing what tips folks may have.
- AlyssaGomez3 days agoStaff
Hey JessicaLough687 and KairasMistry-9d, this is a great question about AI text-to-speech pronunciation. I recommend starting by spelling the word phonetically, as you mentioned.
You can also try the V3 voice model, which is our latest model and offers the most accurate pronunciation. We made V3 the default today, so if you tested text to speech before that change, you were likely using the previous model.