With the advancement in technology....what is your go to Text to Speech software package for self directed presentations? Unfortunately, professional studio audio reads are out of the question as far as budget. Thanks for the help and the read.
I'm not particularly impressed by any TTS software I have ever tried/heard. I did work with Neospeech a few times and that seemed to be the least 'robotic-sounding' software I have tried.
I agree with Michael about most TTS software not sounding very natural. However, I do need to use it on occasion (mostly if I want to do quick demos) and I have found that IVONA TTS sounds a little more natural than most (Particularly the Salli female voice, but they also have voices in other nationalities if you need them). They were actually purchased by Amazon a few years ago.
I use the Mac terminal Ivona sounds pretty good too. Only use it for scratch audio in most cases. As a learner, robot voice seems a little lazy and impersonal. It can work but if I had another choice, I'd choose no audio or a real narrator.
For each narration chunk, I output a .txt file from an Excel spreadsheet. This also builds a terminal script with lines for each file in this format:
say -v lee -f /Users/user/folder/Dropbox/ubs/tfi/production/scratch_audio/s2_c3.txt -o /Users/user/folder/Dropbox/ubs/tfi/production/scratch_audio/s2_c3.aiff
Steve - I totally agree regarding the lazy/impersonal statement; text to speech is more of a tide-you-over kind of option, or one for those who aren't willing to invest in voice talent. I actually just went through a bunch of the Mac terminal voices, and they're all SO awful! Alex is tolerable, but oh my goodness. I usually stick to Microsoft Kate because she doesn't brutalize as much technical jargon as the others, but it's all painful to listen to as a learner (in my honest opinion).
A lot of the default installed voices are pretty bad. Alex is awful. Many of the better voices are an extra download. Not shipped with the base install since they are pretty heavy file sizes. Lee and Samantha aren't terrible. Samantha is similar to Siri. They are temporary for my stuff.
Okay... First the disclaimer - I'm a professional voice talent. (You can probably guess my attitude towards text-to-speech.)
However, I'm happy to admit (well, not HAPPY, actually, but "willing") that there are some very good text-to-speech applications out there. I fact there are many applications where text-to-speech is the best way to go. Though, as I'm sure you've observed, even though good text-to-speech is, at first, almost indistinguishable from a "real" human being, there's always something, ...something really important, missing, actually a couple of things missing when text-to-speech is used for communication. Those missing things are understanding and emotion.
Think about it.... When you listen to a person talking, you're not just hearing words. All of us are programmed to search for additional clues from the person speaking - Is this person being honest? Do they truly understand what they are talking about? What is their opinion of us, the listeners? What is their current emotional state? The list goes on and on.
I use the example of coming in late for work. You dash in, and your boss says "Good morning." With those two words, you know exactly where you stand- there can be pages and pages of "information" there....Along with some clues as to how you should respond. Or if your boss is "hiding" his or her meaning in the statement, you know you have an even bigger problem.
That's why people report being exhausted after listening to more than a few minutes of text-to-speech. You take in the sounds, then re-interpret them as something similar to written text, then add appropriate "missing" content. Or if the text-to-speech is really good, you're spending more brain energy wondering what's wrong with the communication.
For those of us who make our living as narrators, the biggest challenge is "meaning what what we say". That's why we need to take time to understand the material-to understand the emotional as well as logical content of each paragraph, each sentence and each word.
I checked out IVONA and some of them are fairly decent. What plan do you have. I tried looking at some of the product options and it was a little confusing. It also seemed weird that you purchase the license only for one year. Did I misread that?
@Travis: I agree with you about TTS - but I have to ask, since you mentioned there are some very good TTS applications out there, which ones do you think are relatively decent?
@Jackie, if I were in need of a text-to-speech solution, instead of looking for something that will attempt to "trick" the listener into thinking they are listening to a "real" human being, I would instead base my selection on simple clarity. -Can the listener understand what is being said. That eliminates the problem of the listener trying to figure out if the speaker is "human". Taking that position, most of the applications mentioned above would work quite well.
When I do a narration, I attempt to define just who I am to the audience/listener with the first paragraph - to make it clear that I'm a professional narrator. I'm not trying to pretend that I'm the listener's best friend, teacher, employer or a Subject Matter Expert - I'm a "spokesperson". I try to convey to the audience a sense of confidence, not just on my part, but on behalf of the people who are paying me to deliver the message. -That they respect the listener enough to hire a professional to deliver the audio portion message - and that the listener is sophisticated enough to understand that. That part of the message is simple and sincere - we're not trying to "fool" anybody.
On the other hand, by bringing in a machine to do something as personal as verbal communication, and adding to the insult by attempting to make the listener think it's a real person, we are showing considerable disrespect. In Apple's commercials, we find Siri to be quite funny - Why? because we know that she's not human. In that case, here's some real honesty there.
You're right, T. I think that's the crux of it. Authenticity.
I wouldn't feel right exposing someone to long form narration, slide after slide, with TTS audio. But I have considered using a dynamic API to generate custom feedback cues based on the choices made, who someone is, where they are, etc... Use it to do something that a pre-recorded narrator simply cannot do without generating an infinite number of files
This one is pretty keen. I've gotten a couple of experiments to work in Storyline. Where the user answers a series of questions and the API generates custom audio based on their responses. Clever but I'm not sure how well it would play with most audiences.
14 Replies
I'm not particularly impressed by any TTS software I have ever tried/heard. I did work with Neospeech a few times and that seemed to be the least 'robotic-sounding' software I have tried.
Hi Bryan,
I agree with Michael about most TTS software not sounding very natural. However, I do need to use it on occasion (mostly if I want to do quick demos) and I have found that IVONA TTS sounds a little more natural than most (Particularly the Salli female voice, but they also have voices in other nationalities if you need them). They were actually purchased by Amazon a few years ago.
Good Luck!
- Troy
I usually use Adobe Captivate's TTS functions.
I use the Mac terminal
Ivona sounds pretty good too. Only use it for scratch audio in most cases. As a learner, robot voice seems a little lazy and impersonal. It can work but if I had another choice, I'd choose no audio or a real narrator.
For each narration chunk, I output a .txt file from an Excel spreadsheet. This also builds a terminal script with lines for each file in this format:
say -v lee -f /Users/user/folder/Dropbox/ubs/tfi/production/scratch_audio/s2_c3.txt -o /Users/user/folder/Dropbox/ubs/tfi/production/scratch_audio/s2_c3.aiff
Steve - I totally agree regarding the lazy/impersonal statement; text to speech is more of a tide-you-over kind of option, or one for those who aren't willing to invest in voice talent. I actually just went through a bunch of the Mac terminal voices, and they're all SO awful! Alex is tolerable, but oh my goodness. I usually stick to Microsoft Kate because she doesn't brutalize as much technical jargon as the others, but it's all painful to listen to as a learner (in my honest opinion).
A lot of the default installed voices are pretty bad. Alex is awful. Many of the better voices are an extra download. Not shipped with the base install since they are pretty heavy file sizes. Lee and Samantha aren't terrible. Samantha is similar to Siri. They are temporary for my stuff.
And Lee's voice... Some of the custom voices can work for scratch audio.
Okay... First the disclaimer - I'm a professional voice talent. (You can probably guess my attitude towards text-to-speech.)
However, I'm happy to admit (well, not HAPPY, actually, but "willing") that there are some very good text-to-speech applications out there. I fact there are many applications where text-to-speech is the best way to go. Though, as I'm sure you've observed, even though good text-to-speech is, at first, almost indistinguishable from a "real" human being, there's always something, ...something really important, missing, actually a couple of things missing when text-to-speech is used for communication. Those missing things are understanding and emotion.
Think about it.... When you listen to a person talking, you're not just hearing words. All of us are programmed to search for additional clues from the person speaking - Is this person being honest? Do they truly understand what they are talking about? What is their opinion of us, the listeners? What is their current emotional state? The list goes on and on.
I use the example of coming in late for work. You dash in, and your boss says "Good morning." With those two words, you know exactly where you stand- there can be pages and pages of "information" there....Along with some clues as to how you should respond. Or if your boss is "hiding" his or her meaning in the statement, you know you have an even bigger problem.
That's why people report being exhausted after listening to more than a few minutes of text-to-speech. You take in the sounds, then re-interpret them as something similar to written text, then add appropriate "missing" content. Or if the text-to-speech is really good, you're spending more brain energy wondering what's wrong with the communication.
For those of us who make our living as narrators, the biggest challenge is "meaning what what we say". That's why we need to take time to understand the material-to understand the emotional as well as logical content of each paragraph, each sentence and each word.
-Travis
@Troy,
I checked out IVONA and some of them are fairly decent. What plan do you have. I tried looking at some of the product options and it was a little confusing. It also seemed weird that you purchase the license only for one year. Did I misread that?
Jerson
@Travis: I agree with you about TTS - but I have to ask, since you mentioned there are some very good TTS applications out there, which ones do you think are relatively decent?
ReadSpeaker is capable of giving pretty good results. Have only used it for British voices so far, but the feedback has been good.
PS I'm generally not a fan of synthetic voices but if the client demands it (which has been known) then RS is one of the better ones to use IMO.
@Jackie, if I were in need of a text-to-speech solution, instead of looking for something that will attempt to "trick" the listener into thinking they are listening to a "real" human being, I would instead base my selection on simple clarity. -Can the listener understand what is being said. That eliminates the problem of the listener trying to figure out if the speaker is "human". Taking that position, most of the applications mentioned above would work quite well.
When I do a narration, I attempt to define just who I am to the audience/listener with the first paragraph - to make it clear that I'm a professional narrator. I'm not trying to pretend that I'm the listener's best friend, teacher, employer or a Subject Matter Expert - I'm a "spokesperson". I try to convey to the audience a sense of confidence, not just on my part, but on behalf of the people who are paying me to deliver the message. -That they respect the listener enough to hire a professional to deliver the audio portion message - and that the listener is sophisticated enough to understand that. That part of the message is simple and sincere - we're not trying to "fool" anybody.
On the other hand, by bringing in a machine to do something as personal as verbal communication, and adding to the insult by attempting to make the listener think it's a real person, we are showing considerable disrespect. In Apple's commercials, we find Siri to be quite funny - Why? because we know that she's not human. In that case, here's some real honesty there.
You're right, T. I think that's the crux of it. Authenticity.
I wouldn't feel right exposing someone to long form narration, slide after slide, with TTS audio. But I have considered using a dynamic API to generate custom feedback cues based on the choices made, who someone is, where they are, etc... Use it to do something that a pre-recorded narrator simply cannot do without generating an infinite number of files
This one is pretty keen. I've gotten a couple of experiments to work in Storyline. Where the user answers a series of questions and the API generates custom audio based on their responses. Clever but I'm not sure how well it would play with most audiences.
https://www.vocalware.com/
http://edition.cnn.com/2013/10/04/tech/mobile/bennett-siri-iphone-voice/
This discussion is closed. You can start a new discussion or contact Articulate Support.