30 Replies
Joel Harband


I thought it would be helpful to list the various issues with TTS that course developers have noted in this thread up till now in the form of a check list.

- TTS integrated into the course development tool, eliminating the need to export the texts, use an external tts generator to produce an mp3 file of each text and import the audio files in Storyline - which is a hassle especially when you change and update the narration often. When TTS is integrated, this happens automatically and transparently, saving a lot of time.

- keep the nuances and word stresses in TTS that you get with a real voice

- use of speech tags in the text to modulate the TTS voice, when necessary, should be easy and quick to learn

- voices should sound natural, clear and consistent. TTS voices from vendors like Acapela have improved a lot since this post three years ago and are effective in e-learning.

- TTS voice license provided for commercial applications. Note: A price of less than $50 for TTS indicates a personal voice license that cannot legally be used for commercial applications - such as producing e-learning courses in a company or institution (an exception are the TTS voices that come with the OS). A price of a few $100s for commercial TTS, which saves a company $1000s in costs and time, is quite reasonable.

As the developers of Speech-Over , narration software for PowerPoint that interfaces with e-learning products including Storyline, I encourage you to check out Speech-Over against this list.

Joel Harband
Tuval Software Industries
joel [at] speechover [dot] com

Roger Mepham

I was playing with the Captivate Neo voices the other day and one trick I noticed was that if you alternate voices say as a conversation between avatars the robotic inflection sounds less repetitive.

Sadly they are still not good enough for high quality content although useful for demos prior to rerecording with a voice artist.


Henrik Clausen

Roger, that's a good point! I was doing that for dialogues in my current project, but had not thought of that this also mitigates the monotone nature of artificial speech.

Next, I wonder if italics, bold or underline could be used by the TTS synthesizer for the same purpose? It would make sense from a logical point of view, but have little idea about the technical aspects of this.

Henrik Clausen

Nice summary, Joel!

I do disagree with one point, namely the price point you expect for commercial use. I'm working for a major dairy company doing internal training in Storyline, and while US banks or suppliers to the army may be awash in cash, we're certainly not, and would balk at "a few $100s" for one voice.

Joel Harband

Thanks, Henrik.

 Concerning the pricing of commercial TTS voices, it is a matter of economics: the commercial TTS voices are competing with the real voice talents. The price of a very reasonable voice talent service (narratorfiles) is $20 a page which works out to about $500/hour of recorded audio (a one-hour course). One hour of recorded audio is about a month's work so using voice talents for one developer comes to $500 a month.  To compete, the commercial TTS voices are priced down accordingly, sometimes 50%, to be worthwhile - including any text editing required to modulate the voice. In addition, the price usually includes access to more than one voice.

 For more details, have a look at the article I wrote with Tony Karrer:

 Text-to-Speech Costs – Licensing and Pricing