Forum Discussion
Text to Speech software
It is not at all because I'm a voice-over/narrator that I am opposed to the use of text-to-speech technology. It is solely out of concern over results. The goal of instruction of any kind is to either simply share information or to change behavior/performance. And, just like TV and radio commercials, success hinges squarely on whether the message is able to not only grab but hold the attention of the listener/viewer/learner so that they will ABSORB what they saw and/or heard, and that they will also RETAIN that information so that they can later APPLY it. In the case of commercials, advertisers hope to motivate people to buy their product or service. In instruction, it is hoped that learners will be able to use what they learned to better their performance.
Thus, eLearning success cannot be measured by the quantity of material that was produced in X number of hours or that X number of dollars were saved. Success is measured in what people remember and are able to apply.
I wonder if any company who uses eLearning has done an analysis to compare the money spent on producing the learning content against whether there was a marked improvement in employee performance. It would seem to me, that if a company's goal was to maximize employee performance, the LAST place they'd consider cutting costs would be the tools and methods used to achieve that goal. When the goal is to get to the finish line faster, we don't use cheaper fuel.
It seems that proponents of text-to-speech don't understand that inflection; NATURAL inflection is the key, and synthesized speech, no matter that the "voice" may sound so human-like, will never be able to place correct inflection where needed and not where it is not needed.
Here's a practice exercise for everyone: spend some time listening to people engaged in conversation. People you know, even people you don't know. When we speak among ourselves, we add inflection without thinking about it, placing importance on some words, less on others, a smile here, some compassion there, a sudden burst of excited whisper, the occasional dramatic pause to build anticipation or allow a point to sink in before moving on, etc. We do these things automatically and it turns our speech into music. And it makes those listening more prone to continue listening. Until such a time where the algorithms behind synthesized speech are able to "understand" and contextualize the words (which, to a computer are just more ones and zeroes), it will never be able to add the NATURAL engagement factor called inflection where it's supposed to be and, thus, will never reach the effectiveness level of human speech.
If there is no connection; no grasp of the material so as to place the proper inflection where it belongs, what is spoken is cold and completely non-engaging. Just like some of the boring teachers we all had in school. If there is no natural engagement, there is no hope of holding the attention of a learner. And without their focused attention, the efforts and money spent on everything that went into the eLearning is wasted. I'll say it again:
eLearning success cannot be measured by the quantity of material that was produced in X number of hours or that X number of dollars were saved. eLearning success is measured ONLY in what people remember and are able to apply in order to make a difference.