Example
An AI-Powered Knowledge Check in Storyline
I've been wrestling with this challenge for a while: How do we get learners to reflect without relying on quiz after quiz? How can we use open-ended questions to encourage deeper thought?
I've long considered AI for this, but there were hurdles... How do you integrate it into an Articulate Storyline course without paying for tokens or setting up contracts? And how do you do it without leaking credentials in the course itself? Can it be done without having to modify code after exporting the course?
I learned recently that Hugging Face Transformers provide a solution. You can now download an AI model to the learner's machine and run it locally in their browser. I've managed to get this running reliably in Storyline, and you don't have to modify the code after export!
In the final slide of the demo, your goal is to recall as much as possible from the podcast/summary. The AI will then check your response and give you a percentage score based on what you remembered.
Live demo & tutorial here:
https://insertknowledge.com/building-an-ai-powered-knowledge-check-in-storyline/
If you want to learn how I recommend starting with the sentiment analysis because it's easier to get started. I've also provided a file to download in that tutorial if you want to reverse engineer it.
5 Replies
- Akki1111Community Member
This is so cool!
- AndrewNewellCommunity Member
I thought this was an interesting idea! Using AI for response evaluation is something I haven't seen before in an elearning module. The potential seems enormous.
A couple of notes... I tried answering three or four different way with different amounts of information, but always got a similar score between 57-61. I think having feedback from the evaluator would be really helpful so I could understand how to improve my answer.
Thanks for sharing this!- arronCommunity Member
Thanks for putting it through its paces! 57-61% is an epic score. I found most people hovered around 50%. I think this demo is particularly hard actually and you've prompted me to update the article with some ideas on how it could be improved. Including additional, specific feedback from the evaluator which is a fantastic idea.
- Nathan_HilliardCommunity Member
Great demo! I just started looking into this area myself and happened across your post. I see you used the MiniLM-L6-v2 model. When I was exploring various models, this did seem to result in one of the broadest similarity ranges between good and bad responses. Have you looked very much into other models or any other comparison approaches? Despite being around for a while, the available details on all of this seem rather diffuse and a bit cryptic. I guess I've finally found a meaningful use for AI chat models. 😀
- arronCommunity Member
Hey Nathan! Glad you liked the demo 😀. These are some similar options:
- paraphrase-MiniLM-L3-v2 (uses less memory and is potentially less accurate)
- all-distilroberta-v1
- google/embeddinggemma-300M
Based on lessons learned since building this example, one thing that could be improved in this example to achieve a better comparison would be the source text. My memory is a bit fuzzy on this now but I think I used the entire source paragraph to compare against. It may yield better results if the sourceText was a concise sentence with the key points we want our learner to remember. You could even adapt it to use multiple examples:const idealAnswers = [ "Passkeys replace passwords and are phishing-resistant.", "You use biometrics for passkeys, which are stored on your device.", "It's a more secure way to log in without a password, using your phone or computer." ];
One other approach with a steep learning curve is fine-tuning the a model with good and bad examples so it's closer to what you're looking for. I believe Google Colab offers free GPU usage which is handy for that sort of thing.
