Scoring User Drawn Images in Storyline

Huh, my whole previous post just vanished. Trying again...

This is a follow-up to a previous discussion post on drawing annotations on a Storyline slide.

In a previous post, I demonstrated an approach allowing the user to draw on an image to indicate some kind of response. It utilized a canvas element and intercepted mouse-clicks and movements to draw paths. The next step, as pointed out by Math Notermans, was to score the user’s input in some way. There are several JavaScript libraries available that perform image comparisons, usually returning some kind of quantified pixel difference as a result. Resemble.js is one such option. It returns differences as a percentage of pixels compared to the entire image size. The question is then, how to turn this into usable score?

Demo: https://360.articulate.com/review/content/d96de9cf-2fd1-45a5-a41a-4a35bf5a1735/review

In this example, I made a few improvements to the annotation script that was posted previously. Most notably, I added a simple undo option that records and recreates user drawings. This also allows for the user’s drawing to maintain its sharpness after resize events, instead of the previous approach of straight scaling. I also changed it to allow drawing on a touch screen (limited testing).

I included a loader for Resemble.js, and some code connected to the Check button to evaluate what the user has drawn. While this example is really just meant to demonstrate the process and help you visualize the results, the idea could easily be applied to some kind of complex user interaction that is not better served by more traditional point-and-click or drag-and-drop selections. As this demo shows, it could be particularly well-suited for having users determine the proper pathway for something in a more free-response fashion, as opposed to just selecting things from a list, or dropping shapes.

After drawing a response to the prompt, clicking Check will generate a score. The score is based on the comparison of the user’s response to predetermined keys, which are images that you include when building the interaction. I used two keys here, one for the ‘correct’ answer, and one for a ‘close’ answer. You can set it up to include more key options if you need more complexity. Since all we get from Resemble is a difference score, we need to convert that into a similarity score. To do that, I followed these steps.

Copy the key images to individual canvases.
Create a blank canvas for comparisons.
Convert these and the user drawing canvas to blobs to send to Resemble.
Compare the user drawing to the blank (transparent) canvas get its base difference.
Compare each of the keys in the same way to get their base difference scores.
These, along with the visualized differences, are shown on the top three inset images.
Then, compare each key with the user drawing to get the compared differences.
The comparison order needs to be consistent here.
These are shown on the lower two inset images.
Calculate the similarity scores (this will be slightly different between scenarios, so you need to customize it to create the score ranges you expect.

The similarity is essentially a score that ranges from 0 to 1, with 1 being the most similar. When creating your keys, you need to note what brush sizes and colors you are using. Those should be specified to the user, or preset for best results. Resemble has some comparison options, but you want to make the user’s expected response as similar to the key as you can.

For the ‘Correct’ answer:

The similarity is just:

1 - (compared difference) / (user base difference + key base difference)

To properly range this from 0 to 1, we make also make some adjustments. We cap the (user + key) sum at 100%, and then set the Similarity floor to 0.

We also divide this result by an adjustment factor. This factor is essentially the best uncorrected score you could achieve by drawing the result on the slide. Here, I could not really get much over 85%, so we normalize this to become 100%.

Next, we do an adjustment that weighs the total area of the ‘Correct’ key to the total area drawn by the user. If the user draws a lot more or less than the correct answer actually contains, we do not want the result to be unduly affected. This eliminates much of the influence caused by scribbling rough answers across the general correct location. Before, scribbling could often increase the final score. This fixed that. Our adjustment is to multiply our current similarity score by:

(the lesser of the user or key drawing base differences) / (the square of the greater of the base differences)

We use the square in the denominator to ensure that drawing too much or too little will rapidly decrease the overall similarity score.

We again cap this final adjusted similarity score at 1, ensuring a working range of 0 to 1.

For the ‘Close’ answer:

The idea is similar, but may need adjustment. If your close answer is similar in size to the correct answer, then the same procedure will work. In our case, I used a region around the correct answer to give partial credit. This region is roughly 2 times the size of the correct answer. As a result, we only expect a reasonable answer to cover about 50% of the close answer at best, so our minimum compared difference should be about half of the key base difference value. To compensate, we add an additional adjustment factor for the ratio between ‘close’ and ‘correct’ answers (here 2). We set our other adjustment factor like we did before, with the highest achievable uncorrected score (which unsurprisingly is about 0.4 now instead of 0.85).

The Final Score is just the greater of the similarity scores times a weighting factor (1 for ‘correct’, 0.8 for ‘close’), converted to a percentage.

To improve

To make this more useful, you would probably want to load it on the master slide, and toggle triggers from your other slides to make comparisons.
Rearrange the code to only call for the processing of the keys and blank canvas once per slide, or only after resizing, instead of each time Check is clicked to save some overhead.
Probably should actively remove the previous canvas elements and event handlers when replaced.
This uses a bunch of callback functions while processing blobs and comparing images, which requires several interval timers to know when each step is complete before starting the next. Might be able to do it better using promises, or restructuring the code a bit.
I think Resemble just works on files, blobs, and dataURIs (e.g., base64 encoded images). Haven’t checked if it can work directly from elements or src links, but I don’t think so.
Probably should load Resemble from static code to ensure functionality. Could also load key images from files instead of slide objects. That might be easier for users to locate and view however.
There are other library options for comparing images. Some may be faster or more suited to your needs. If they produce a difference score, then the same approach should mostly apply.
Fix the sliding aspect of the slide on mobile when drawing with touch.

Image Similarity Scoring_1.6a.story3 MB

e-learning development