Two thoughts: research shows that receiving words through the audio channel and visual channel simultaneously creates more interference than reinforcement. I know, it seems counter-intuitive, but the strongest reinforcement is seeing a picture while hearing a description of it. Kinetic text also seems to have good results, but is very time-consuming to produce.

Still, if you have to do it, I recommend a small text box with a variable in it. Assign one or two lines of text to the variable, and change it at the appropriate time to pace the narration.

Walt, I understand what you're saying, but I don't think what I have in mind would be too difficult for learners to follow. Basically, I have a scrolling panel of images that are similar to a flow chart. As learners flow through the images, I wanted corresponding audio narrating.

I might actually end up doing something like the Manhattan map example shown here: https://community.articulate.com/articles/are-scrolling-panels-in-your-e-learning-bag-of-tricks-they-should-be