I think of these a little bit differently. There definitely seem to be some overlaps and, as always, context matters. It depends
I'll start with engagement. To me, engagement is an emotional thing. Do I care enough about this to give it my attention? Am I connected with it enough to return? In this definition, I would measure engagement by the intensity of a participant's attention as a function of how much they care about the experience or a component of the experience. There are many things that could drive this level of intensity. A well crafted moment, a challenge, relevance to a real life encounter, reciprocal rewards, and social motivation are among the factors that I think drive engagement. Whether the trigger is sensory, cognitive, interpersonal - I think engagement is 100% emotional. And I also believe you can define engagement as active or passive (latent). For example, if I return to the same store because I love the service, that's engagement. But I don't think about it until I need something. The emotional connection I have with that service is latent and is only active when I need to make certain choices.
Interaction plays a part in engagement as well. Interaction to me is the interplay between human action and the feedback they receive in response. Something provides a cue, I respond. That's interaction. I take an action (or fail to take an action), some thing (or other person) provides a response. That's interaction.
However, I don't like to define this in terms of interface affordances (mouse and keyboard) because I think it cheapens the meaning of interaction. At the core, isn't action driven by some kind of deep or shallow internal process? Feedback needs to reach back to that initial trigger and connect to be valuable, in my opinion. I do something, something happens in response, and I recognize that response. It's symmetrical. Defining interaction as flicks and clicks... as the mechanics of motor interaction.. I don't think that accounts for all of the internal things that happen in a feedback loop. The act of stroking a key and moving the mouse is a very shallow (and possibly completely hollow) activity. I think what's behind the physical actions matters more to the interaction than the physical action itself. The connections to meaning driven by the feedback loop.
Could a video itself be interactive? I think it can but it takes some faith in abstraction to define an experience without physical interaction that way. Let's say a video plays, presents an engaging and enlightening sequence and pauses to ask a question. You reflect on this question and draw a conclusion in the time of the pause. The video continues and provides another perspective, causing you to reflect on your previous hypothesis. In the same situation, let's say the video asks you to write down a list of your concerns then provides feedback. Same effect but it's not dependent on device input.
Is a non-physical interaction (all internal but still symmetrical) better or worse than an activity that requires you to click around with a mouse to identify a list of factors? I'd say it depends but it's likely far better in some and worse in others. Context and execution are everything. Does that make the situation above that required no input from the user non-interactive? I think it's open to interpretation
In my mind, interaction isn't interaction if the feedback loop doesn't connect symmetrically with the thought that triggered / drove the interaction. If there ain't thinking involved, there ain't interaction involved. Well written copy (such as those that produce an Aha! moment) can provide superior effect to a physical interaction. If the action was my opening expectations and the copy (audio, video, text) provided a feedback loop that adjusted my perception, I'd call that interaction.
TL;DR - Summarizing my definitions:
Engagement - A measure of attention (intention) in terms of emotional involvement with a whole experience or moment (part) of an experience.
Interaction - The relationship between an action (including an inaction) and the response of an object or system that creates a relatively symmetrical feedback loop. Not limited to actions expressed through a computer interface.