# Writing good quiz/test question answers/distractors

Sep 08, 2011

1.

2. You can vary the number of distractors. Three to five distractors is ideal. A smaller number of answers/distractors increase the probability that a guess will be correct, however.

3. ALL distractors must be plausible. These are the best types of plausible-but-incorrect distractors:

a. Common errors and commonly held myths or misconceptions (for those with less

knowledge or skill)

b. Statements that are true, but do not answer this question

c. Content that is paraphrased incorrectly

4. If answers/distractors include best and not-as-good alternatives (“Select the best answer…”),

Make sure that there is an unambiguously correct answer or answers. Provide enough detail to differentiate best from not-as-good.

6. Avoid answers/distractors that combine distractors (“b and c”).

7. Avoid using “all of the above,” and “none of the above.”

8. Make sure to use different placements of the correct answer. The most common placement of correct answer is c and test-wise learners know this.

NEVER include silly distractors.
###### 42 Replies

Very good advice, Patti, thanks for sharing! I do agree with all of the points you listed, though I had never thought to note this down.

Thanks,

Stefano

Great list Patti.

Wanted to highlight an issue on point 6- scoring bias. Let's say you have A, B, C, D as options and "B and C" in some manner is the answer (either as option D or as a mutliple choice with more than one option able to be selected- multiple response format).

The person who gets B and C is 100% correct. If someone gets only B or only C, they are half-correct. If anyone selects A. they are 100% incorrect. If combinations of A and B or A and C are possible, how is that scored? -1 for incorrect 0 for missed opportunity and 1 for each correct reply? How does that impact overall weighting for the total assessment.

Oh- and many times, I see designers make "B and C" answers correct where either B or C are sufficient options on their own, and are not required in conjunction (for example, two equivalent alternatives to saving a document like CTRL+S or File>Save).

Designers really, really need to think through the design implications.

Minor detail on point 8- if the answers are ordinal, like dates (May 23, June 6, July 10) or amounts (4ml, 5 ml, 6ml) I don't recommend shuffling (there are exceptions, but as a general rule, ordinal items should be listed in ordinal fashion so you are testing the knowledge and not catching them in a shuffle trick).

Many good items writing resources out there like the Donath Study and the National Board of Medical Examiners Item Writing Guide.  But my personal favorite remains Cathy Moore's Action Learning Hero- it ensures that before you get at the item-writing level that you are testing applicable skills vs the enabling knowledge (both are acceptable to do, but the absence of one means you ended the journey short of the finish line).

David, Let me see if I understand your concern about #6. It'd be better to use a "Choose all that apply" because the language is less confusing. Also, research shows that when we use that B and C language, test-wise students know that this is the correct answer most of the time.

Good point about shuffling. If the answers have a natural order, it's a good idea to put them in that order (by date, amount, etc.) for clarity sake. Good save... thanks! So that should be #9, right? How would we write that? #9: If the answers/distractors have a natural order (chronology/amount/etc.), place them in the natural order. Or would you write that differently?

Here's our a primary guideline from our Draft SOP for Self-Paced eLearning - Assessments, Test Items, and Questions section. I can't take credit for all of this, but I do think it's quite good. The section addresses feedback guidance, requirements for format, distracters, mastery / cut score, and packaging :

_________________________________________________

It is common for Self-Paced eLearning assessments to assess recall of facts associated with a principle or procedure. Frequently, this does not represent a one-to-one correlation with the actual performance of the procedure. By over-sanitizing or over-simplifying the variability presented by the task challenge in the real-world, it becomes difficult or impossible to prove the training actually improved performance.

- The complexity of an assessment item shall match the complexity of the task being measured.

When writing test items for eLearning assessments, consider more test types than multiple choice questions. Test items should also consider multi-part decisions, approximations of the task, and authentic simulations of the task environment.

Research supports the value of questioning to learning. Questions and test items used to probe and facilitate elaborative responses can increase comprehension, critical thinking, and learning. Reflection exercises can also be used encourage higher order thinking and may significantly improve learning in some situations.

_________________________________________________

Here are a couple of elements from our requirements table:

- Mastery Score:

Default mastery (cut score) for Minimally Acceptable Competency (MAC) level shall be 100%. The default score requirement may be adjusted during the project alignment. The cut score should not be an arbitrary number (i.e., 80%). The Angoff method can be used to determine a defensible cut score.

- Distracters

The test item and all distracters must be consistent with the course objectives and be educationally sound. The item stem, correct response and all distracters must be reviewed for clarity, relevance, ambiguity, cueing, appropriateness, bias (sexual, race, geographical, etc.) and validity. Reference SOP 9, 5-1 through 5-22.

We have a whole SOP for test item generation. It's pretty comprehensive but we've included enough in the eLearning guide to address most common issues. Here's a checklist we include for assessment item validation (copied and pasted directly from the guide):

• Does the test item measure a learner’s ability to perform?
• Is the test item accurate?
• Is the test item clear and understandable?
• Does the test item have only one correct answer?
• Are all distracters non-ambiguous and within the realm of possibility? (no throw away distracters)
Are the answer choices keyed accurately?
• Is the wording or terminology correct?
• Are the test item and all distracters free of clues that might indicate the correct answer?
• Are supporting materials (graphics) relevant to the question?
• Do supporting materials (graphics) provide sufficient information to answer the question?
• Are graphics and other supporting materials clear, readable, and realistic?
• Does the test item require the learner to use the information in any accompanying materials to get the correct answer (application) rather than just find the answer (reading)?

Patti,

My concern about #6 isn't wording nor answer forecasting, it's scoring bias.  If B and C are correct. Wouldn't that say someone choosing just B is partially correct and perhaps more knowledgeable than the user who responds with A? Whether item D is listed as B and C, any user selecting B or C is more correct than the user selecting A (because they have part of the answer).

As for multiple response- it gets extremely messy. Isn't the person who answers "B" more correct than someone who only selected "A"? Now, what if they selected A, B, D? A, D? B,C,D? A, D?

There are, of course, answers where B and C in conjunction is the only absolute correct answer and having "only B" or "only C" is nto considered partially correct. But in administering over 8,000 questions at the current time, I don't see this being the norm of items being submitted in this format.

Same scoring bias is inherent in "all of the above" too Any one answer is almost always partially correct.

Your wording for 9 is perfect- its really just an append to 8. With my teams it's a simple "just don't shuffle everything- when you look, it's obvious". Most folks get it on sight, but your wording is much more eloquent (my cup runneth over with snarkiness).

This is a great summary of best practices, Patti – thanks!

I would add another to the list:  Ensure all answers/distractors have the same grammatical structure (tense, fragments s. sentences, etc.).  This is one of my pet peeves, as it can be confusing for the learner (and it looks sloppy). J

And I think a key point is that in most cases, it’s much better to write a MC question from a scenario standpoint (“Here’s the situation…  What should Jane do?”) rather than from a straight facts perspective.

Sheila, Awesome addition to the list. Clarity of writing has to be at the top of the list when writing quiz/test questions. Grammar and parallel structure add to clarity. (And I agree that it just looks better! And like most people who write instructional content, I’m totally OCD about things looking right. Sounds like you are too. Shhhh, I won’t tell.)

I couldn’t agree more about scenario-based questions. I think we should start a thread on writing the stem (the part that asks the question) after we kill this part about answers/distractors. And hmmm, it really would have made more sense to talk about the stem first.

David, I see your point. That’s the reason I said that that type of question might be better asked as a “Select all that apply” question rather than trying to combine answers, which is just confusing. Agree about All of the above. That answer is usually correct when used and test-wise learners know this.

I think some testing systems do give credit for partially right answers. Looked into the logic of computer adaptive testing and it’s pretty interesting. Not that our typical e-learning testing systems can do these complex algorithms. (Someone tell me if I’m wrong.)

Psychometrics says to make questions as hard (or easy) as the real life challenge. So if a person really has to select from many options, selecting from many options on a test makes sense. And sometimes, we ARE trying to see if someone can differentiate between partially correct and completely correct answers (this is something people often have to do on the job). But making a question more complex by making the language more complex makes no sense unless you’re trying to test a learner’s understanding of that type of complex language.

Snarkiness is fine. Been known to exhibit some myself at times.

Patti:

100% agree. Cases do exist and some systems do support. Medical scenarios are perfect examples (the first symptoms shared and initial diagnoses cannot narrow it down to "just one thing").

I think the advice is "know the rules and capabilities for scoring" and "the true testing context" before using the question format.

Seriously,

You guys have me speechless. I love this conversation and found it in perfect time. I am in the beginning stages of writing online certification tests for a client and went to pull out my old stand-by Good Fair Tests, by Odin Westgaard and couldn't find it! You've given me what I was looking for.

Thanks for the life line,

Greg

Greg, here are some general guidelines for all test items:

### General Guidelines for all Test Items

1. Match test item to objective. If this isn’t possible, rethink the objective or the type of assessment (but DON’T change objective to match what is easy to assess!).
2. Concentrate on central, critical content rather than peripheral, trivial content.
3. Provide clear directions for the assessment and for groups of questions, including length and additional resources required.
4. Consider the reading level of learners. Question difficulty should come from content not wording.
5. Avoid negatives and double negatives as well as complex, awkward, or tricky wordings.
6. Make sure items are precise, clear, and non-ambiguous. Include all necessary qualifiers but don’t provide unnecessary, superfluous information or irrelevant sources of difficulty (such as the need to do complex computations if that isn’t what is being tested).
7. Avoid words such as always, often, frequently, never, none, rarely, infrequently because they tend to trip up learners.
8. Make sure that each item has an unambiguous correct answer.
9. Make sure test items don’t include clues about the correct answer or about other test items (common mistake!). Make sure grammatical construction doesn’t give away the right answer or that question stems don’t provide answers to other questions.
10. Avoid double-barreled questions (that ask two things in one question).
11. Group questions with the same directions together.
12. Provide examples for complex questions.
13. Prepare an answer key at the same time as the assessment.
14. Avoid having a disproportionate number of correct answer in the same position (i.e., 50% of the answers are c).
15. Test the assessment before using it!

Listen to Patti.  She's real SMAHT!!  LOL.

But seriously, this is an excellent discussion and some great lists.  I don't know if I totally agree about NEEEVVEER using silly distractors.  I think that really depends on the context of the assessment.  In some cases, it may provide a moment of levity or make a specific point.  That's just  my .02 though .

Robert, that's SMAHT-A&&.

The don’t-use-silly-distractors-rule is realy only aimed at tests where the grade counts and the question is important. (And on these kinds of tests, all questions should be important.) The rationale is that it effectively reduces the number of "real" distractors so it makes it easier to guess. Plus, if you are doing item analysis, it would also throw off your item analysis scores and you wouldn't be able to tell if the question was good or needed to be reworked or thrown out.

But on a self-check... sure, why not have something fun? Tom K has some great examples of fun/silly tests/quizzes and he's even smahter.

Steve: Don’t know how I missed your post. The guideline you posted puts an amazing amount of instructional design and assessment writing wisdom in one sentence. If everyone did this, our assessments would be SO much more valid and therefore so much more valuable/worthwhile. It is possible to write multi-choice questions that are at a fairly high level of application, even though most people do not write them.

The complexity of an assessment item shall match the complexity of the task being measured.

Example of a multi-choice question at a reasonably high level of application:

John is designing an assessment for a new course on listening skills for case managers. The most critical objective is the ability to listen effectively, with a good outcome, during difficult, emotional conversations. He has decided to build a rating scale with very clear descriptors. Which approach is best?

a)    This is a good approach but it would be best if it were used on the job (not just after instruction) to rate a variety of interactions between the case manager and patients.

b)    This is a good approach but John should validate the descriptors with content experts to make sure they are accurate and non-ambiguous.

c)    This is a good approach but it will be important that raters use the rating scale immediately after training.

d)    This is a good approach but because it is time consuming it would be best to use a multiple-choice assessment with questions written for complex, procedural objectives.

Another example of a multi-choice question at a reasonably high level of application:

A 57-year-old male who underwent surgery 6 months ago wants to donate blood. Donor information:

Weight 250 lb           Temperature 98.7 F

Pulse 72 beats/min             Hgb 12.1 g/dL          Blood pressure 120/68 mm Hg

Based on the donor information, we should

a)    approve whole blood donation

b)    approve platelet aphaeresis only

c)    reject his donation due to recent surgery

d)    reject his donation due to low hemoglobin

We can do higher levels of application on MCQs using technology... any comments?

"We can do higher levels of application on MCQs using technology... any comments?"

I'd say, yup I posted this on the old forums. We built a set of assessments on heavy machine gun operation and maintenance. It was pretty hotsauce in my opinion and sooo much fun to work on. This was a "skill bomb", or rather a set of "skill bombs" in contrast with "content bombs" that tend to be more painful to design / develop.

In these assessments, the each task was broken down into a set of sub-tasks and steps. Pretty straight forward. We used these definitions to build "task chaining" interactions that blended a couple of different mechanics. One of which was a multiple choice question driven by a pie menu. The pie menu expanded to expose another challenge when the previous challenge was met (or a mistake was made). Other steps were emulated or approximated using fairly simple interactions (drag / drop). This product was all about the cognitive tasks - reinforcing the stuff that happens from the neck up with procedural feedback.

http://www.articulate.com/forums/93529-post202.html

You'll see the rest of the explanation in the post. But, yeah, I definitely think we can test higher levels of application using MCQ's and technology based assessment. It's not always easy. But when it's needed, it's quite satisfying.

Task chaining addresses the "what do you do now" ordering in a relatively natural cognitive sequence. Performance is contextual. I think our assessments should be authentic whenever it's appropriate. One of the coolest things about this is the ability to model task progression and model it at the level of complexity that is appropriate to the learner. Task progression is so important to building clean mental models, in my opinion. The essence of practice.

@Steve, working on a course that does this - at least to a certain extent.  Working with another ID, so we're trying to help each other think things through & he's better at coding the navigation... it's getting quite complex!  But, I'm excited about how it will turn out  Still in storyboard-ish phase (I can't really think thru all that I need until we do some sophisticated navigation & branch planning), but it's fun! :)

Hey folks- was going to post this to the other Elearning Heroes discussion when someone was talking about question pools- it's not as tricky as everyone things.  Here's a screenr vid I did for Learning Solutions 2011.

Also, 100% agree that M/C questions can be higher-level learning. Although we can't get to 100% synthesis like we can with actual roleplay or speaking with a customer, we can create a customer interaction with key decision points that hit the major difficulties in judgement that our users hit.  Even though I agree it isn't quite like the real thing, it makes our users more literate in the process, and insightful regarding the challenges, so mentoring starts in an optimized spot (short version: students come prepared for class).

Also, wanted to share this from Clive (I just got the tweet):

Parts 1 and 2 are quite good too.

I recently attended an item writing class and got a great nugget of advice. After writing the question and answer and distractors, cover up the answer choices.

Looking at only the question, can you formulate the correct answer? If not, than your question should be re-worded.

Very interesting.  Is it that the learner should need to see all of the options before identifying the "most" correct?