14 Replies
Troy Broas

Hi Bryan,

I agree with Michael about most TTS software not sounding very natural. However, I do need to use it on occasion (mostly if I want to do quick demos) and I have found that IVONA TTS sounds a little more natural than most (Particularly the Salli female voice, but they also have voices in other nationalities if you need them). They were actually purchased by Amazon a few years ago. 

Good Luck!

- Troy

Steve Flowers

I use the Mac terminal Ivona sounds pretty good too. Only use it for scratch audio in most cases. As a learner, robot voice seems a little lazy and impersonal. It can work but if I had another choice, I'd choose no audio or a real narrator.

For each narration chunk, I output a .txt file from an Excel spreadsheet. This also builds a terminal script with lines for each file in this format:

say -v lee -f /Users/user/folder/Dropbox/ubs/tfi/production/scratch_audio/s2_c3.txt -o /Users/user/folder/Dropbox/ubs/tfi/production/scratch_audio/s2_c3.aiff

Ashley Chiasson

Steve - I totally agree regarding the lazy/impersonal statement; text to speech is more of a tide-you-over kind of option, or one for those who aren't willing to invest in voice talent. I actually just went through a bunch of the Mac terminal voices, and they're all SO awful! Alex is tolerable, but oh my goodness. I usually stick to Microsoft Kate because she doesn't brutalize as much technical jargon as the others, but it's all painful to listen to as a learner (in my honest opinion).

Steve Flowers

A lot of the default installed voices are pretty bad. Alex is awful. Many of the better voices are an extra download. Not shipped with the base install since they are pretty heavy file sizes. Lee and Samantha aren't terrible. Samantha is similar to Siri. They are temporary for my stuff.

T. Travis

Okay... First the disclaimer - I'm a professional voice talent.  (You can probably guess my attitude towards text-to-speech.)

However, I'm happy to admit (well, not HAPPY, actually, but "willing") that there are some very good text-to-speech applications out there.  I fact there are many applications where text-to-speech is the best way to go.  Though, as I'm sure you've observed, even though good text-to-speech is, at first, almost indistinguishable from a "real" human being, there's always something, ...something really important, missing, actually a couple of things missing when text-to-speech is used for communication.  Those missing things are understanding and emotion.

Think about it.... When you listen to a person talking, you're not just hearing words.  All of us are programmed to search for additional clues from the person speaking - Is this person being honest?  Do they truly understand what they are talking about?  What is their opinion of us, the listeners? What is their current emotional state?  The list goes on and on.  

I use the example of coming in late for work. You dash in, and your boss says "Good morning."  With those two words, you know exactly where you stand- there can be pages and pages of "information" there....Along with some clues as to how you should respond.  Or if your boss is "hiding" his or her meaning in the statement, you know you have an even bigger problem.  

That's why people report being exhausted after listening to more than a few minutes of text-to-speech.  You take in the sounds, then re-interpret them as something similar to written text, then add appropriate "missing" content.  Or if the text-to-speech is really good, you're spending more brain energy wondering what's wrong with the communication.

For those of us who make our living as narrators, the biggest challenge is "meaning what what we say". That's why we need to take time to understand the material-to understand the emotional as well as logical content of each paragraph, each sentence and each word.

-Travis

T. Travis

@Jackie, if I were in need of a text-to-speech solution, instead of looking for something that will attempt to "trick" the listener into thinking they are listening to a "real" human being, I would instead base my selection on simple clarity.  -Can the listener understand what is being said. That eliminates the problem of the listener trying to figure out if the speaker is "human". Taking that position, most of the applications mentioned above would work quite well. 

When I do a narration, I attempt to define just who I am to the audience/listener with the first paragraph -  to make it clear that I'm a professional narrator.  I'm not trying to pretend that I'm the listener's best friend,  teacher, employer or a Subject Matter Expert - I'm a "spokesperson".  I try to convey to the audience a sense of confidence, not just on my part, but on behalf of the people who are paying me to deliver the message.  -That they respect the listener enough to hire a professional to deliver the audio portion message - and that the listener is sophisticated enough to understand that. That part of the message is simple and sincere - we're not trying to "fool" anybody.

On the other hand, by bringing in a machine to do something as personal as verbal communication, and adding to the insult by attempting to make the listener think it's a real person, we are showing considerable disrespect. In Apple's commercials, we find Siri to be quite funny - Why? because we know that she's not human.  In that case, here's some real honesty there. 

Steve Flowers

You're right, T. I think that's the crux of it. Authenticity.

I wouldn't feel right exposing someone to long form narration, slide after slide, with TTS audio. But I have considered using a dynamic API to generate custom feedback cues based on the choices made, who someone is, where they are, etc... Use it to do something that a pre-recorded narrator simply cannot do without generating an infinite number of files

This one is pretty keen. I've gotten a couple of experiments to work in Storyline. Where the user answers a series of questions and the API generates custom audio based on their responses. Clever but I'm not sure how well it would play with most audiences.

https://www.vocalware.com/