I was having a dicussion with a few different instructional designers and developers about modules we had designed. The topic came up while reviewing on, that the on-screen text did not match the what the voice of the audio was saying. An example would be the user is hearing, "...the COURSES link in the navigation menu displays a list of courses that you are teaching." The on-screen text says "Mouse over the COURSES link for your list of courses. I think the preceived disconnect was with the instruction "Mouse over". So it made me wonder if there was a visual design 'best practice' for matching audio to text on screen.

In my years of TV editing and post production, the unwritten rule was that the visuals were to abbreviate the concept or meaning of what was spoken or heard, usually only pulling out the main point.

Any thoughts apprecaited.

8 Replies
Melita Farley

Hi Michael

I completely agree with you!  TV and film provide great examples of how audio and visuals work together.

The only time I have ever have audio matching the text is if I'm developing material that will work for learners for whom visuals might not work (visually impaired, reading related disabilities, etc), however even then I make a 'read' of the screen optional so that other learners do not have to listen to the screen being read to them!


Steve Flowers

Most research indicates that making text on-screen match audio exactly can hurt learning. It's not so important in the situation you describe above. Even in that case, the communication channel redundancy can cause a distraction ("Why did the course tell me that twice? Was it really that important?")

This is from Richard Mayer. He offers two different rationales for offering text / audio. The first supports offering two modes as a "bonus choice" for different types of consumption preferences. The second is concerned with the cognitive loading effects of competing channels. In a program focused on leveraging multimedia, I believe you should be more concerned with the second:

"the cognitive theory of multimedia learning suggests that the added on-screen text will compete with the animation for cognitive resources in the visual-pictorial channel, creating what Sweller(1999) calls a split-attention effect."

In short, I avoid placing text on-screen that matches audio for a few reasons:

  • Channel overload can actually hurt learning (many academics refer to text and audio as a verbal channel - double running a verbal channel is mind numbing and can be confusing)
  • Most of my learners are intelligent adults, giving text on the screen and reading that text to the learner tends not to treat them as such

I do like providing choice, however. In these cases I'll abstract the content into a print-based article. People learn in different environments and with different habits. If they need a multimedia explanation, they have access to it. If they prefer to read the information, I like to give them that option. But I never read word for word what's on the screen. It's bad form.

Daniel Brigham

Hi, Michael:

Steve is spot on as usual. When I have voiceover (which I often do), I'll use text as a basic summary of the points made. Just the core idea and rarely a sentence. One of the benefits of text is that it gets the ideas across very quickly.

And sometimes, even in a slide that uses voiceover, I'll have the text on-screen "say" the main point. For example, voiceover says, "And here on-screen you see a few reasons why X is a good practice to follow." I try to mix it up.

Tim Slade

Hi Michael,

I’ve had this discussion at my office as well…and I totally agree with the group! I always abbreviate my on-screen text with the audio narration. Typically, the audio narration is full of additional filler-words that wouldn’t normally be included in a bullet point.

Also, for whatever reason, I feel that when the on-screen text and audio narration are the same, it causes you to zone out and not actively listen.

Sheila Bulthuis

Totally agree with all the above.  I would add that if there's a lot of text on the screen and there's narration - even if they're not an exact match - you're going to have the cognitive load/split attention problem Steve talks about.  I like Daniel's solution, which I think works really well - if you need a lot of text, just ask them to read it, then be quiet. 

I really liked Steve's point about providing a choice; I think the Articulate Tutorials are a great example of that - they usually have a screenr and brief written instructions.