In doing research for two articles that appeared in Learning Solutions Magazine about a year and a half ago, I looked at a variety of tests that I found online or had run across in my job as instructional designer. The number of questions that were giveaways surprised me, as did the number that were written at the knowledge level and not at higher Bloom’s levels—for example, at the application level.

It occurred to me that in organizational training shops and in universities, those doing the teaching are usually hired for their subject-matter expertise and may or may not receive formal training in teaching skills, including writing tests (at least not at first). Over time these instructors probably attend workshops and other forms of training, but even so, judging by the samples I found, there is quite a variety in the quality of multiple-choice test items. I suspect we promulgate what we experienced as students ourselves.


So it also seemed to me that it would be useful to have a quality scorecard for multiple-choice items, at both the individual and training-organization levels. Such a scorecard could help the training manager or department head assess all multiple-choice tests against a common standard. The results could be prescriptive, pinpointing areas that each trainer, instructor, or instructional designer needs to improve. The manager could use the scorecard to assess specific tests or to assess a cross section of multiple-choice items collected from each person’s portfolio.

Table 1 (at the end of this article) shows the quality scorecard I am proposing.

Since multiple-choice items play such a pivotal role in online learning, both as embedded questions and in assessments, this quality scorecard could be a useful tool with which to identify and improve questions that may detract from what they purport to measure. For example, if 15 percent of a test is found to contain questions rated as “giveaways,” that means that a student is, in effect, only required to know 85 percent of the information in order to score 100 percent on the test. Said another way, an employee who achieves the minimum passing score on such a test may not actually have the required knowledge to perform a task or job correctly, and the test wouldn’t catch it.

How could you use this quality scorecard?

First, you could use it to perform a quality control check on an actual test or quiz. Secondly, you could use it to get a quantified index of the instructors in your training department. For example, you could collect say, 20 or 25 sample questions from each instructor or instructional designer and run those questions through this checklist. The result could give you not only a numerical index, sort of like a GPA, but also give you specific question-by-question suggestions on what needs to do be done to improve each question.

The scorecard could help you identify rather precise professional development needs for each person, in which case you could give efficient, pinpointed OJT (on the job training). You could also flip it by providing some broader professional development on multiple choice items and then use the scorecard as a measure of improvement.

Table 1 Quality scorecard for multiple choice items
Question Correct Answer Sound Question? Give-away? Trick Question? Bloom’s Level Notes
























Reference the question numbers from an attached or online test, or copy and paste them into this scorecard.

Correct Answer

For the evaluator’s convenience.

Sound Question?

Does the question appear to be a well-constructed, valid measure of the objective?

  • Does the question correspond to the verb in the objective (e.g., select, solve, identify)?
  • Does it ask an explicit question vs. being a fill-in statement?
  • Is most of the verbiage in the stem?
  • Are the choices all about the same length?
  • Are the choices cognitively parallel?
  • Does the question use logical-sounding distractors?


Does the question give cues that hint at or give away the correct answer?

  • Is the correct answer choice much longer than the others?
  • Does the stem use words that only go with the correct answer?
  • Does the grammar give away the correct answer (e.g., use of “a” or “an”)?

Trick question?

Is the question hard to interpret correctly?

  • Does it use double negatives unnecessarily?
  • Does it use complex construction that is hard to follow?
  • Are subtle falsehoods buried in the question choices?

Bloom’s level

Record the Bloom’s taxonomy level at which the items and tests are written. Note: In this list a higher “score” indicates higher Bloom’s levels.

  1. Remembering
  2. Understanding
  3. Applying
  4. Analyzing
  5. Evaluating
  6. Creating


Use this column to record specific details and/or to offer suggested changes.


Revised Bloom’s Taxonomy, retrieved on March 4, 2013 from