The best solution I can suggest is that you open a support case with the Articulate staff. You can do that here: https://access.articulate.com/support/contact.
The audio mis-plays sound more like a bug than anything specific.
As for the memory question: I think you will have comparably sized files regardless of which audio option you use (the TTS or the *.mp3). One advantage of using separate files is that it may be easier to get the narrative quality you want to keep 10k users engaged. One advantage to using the TTS option is that you can quickly update or revise if needed.
With the additional constraint of "not up-to-date computers" (as you note) then I would probably try to streamline this content as much as you can. Can you break it into separate modules? Can you create some type of pre-test to allow users to test out? Etc.
Just a few thoughts.