There are a lot of conversations going on around captions at this time, it seems.
Line breaks are critical for compliance in on-screen captions, where I believe they aren't necessary in transcripts. Having a system that can run both off of the same source, however, is not something I've seen achieved well anywhere yet. Like, Oyez.org has great transcripts for their oral arguments, but they don't do captions. Netflix has a very user-friendly write-up on how to break lines in captions according to WCAG rules; Youtube hosts both captions and transcription, but even those can't distinguish between a paragraph break and a caption break.
My team's solution is to run our captions through sonix.ai. We can export an srt for captions, but then we manually edit the transcript for paragraph breaks for the slide notes.
Articulate recently surveyed for accessibility, and I expressed there the concern you voiced here. In particular, I think that one of the best options would be for the transcript/captions/notes functions to be merged into one super-function.
- I would like to tap a line in the notes, and be brought to that cue point on the timeline.
- The captions should drive the notes/transcripts; but the view has formatting tools.
- Captions/notes are edited on a time-based editor
- Captions play per caption timing, and have styles applied via the caption formatting tool (currently in the Player)
- Notes can be selected and have styles applied and paragraph breaks added in a notes editor. One of those styles could be "non-caption", which removes it from the caption production.
At present, I don't know of any technology that meets the WCAG needs and your interest while running off of the same caption data. I recommend considering a limitation of the time.