How the CPA Exam Is Scored

A behind-the-scenes look at what test takers can expect in today's
computer-based environment.

May 2011
by Journal of Accountancy staff

Anyone who has taken the Uniform CPA Examination, prepared for it, or been involved in the CPA licensure process knows that the passing score is 75. But very few understand what that 75 means.

In January, the exam structure had its biggest overhaul since the exam switched from paper and pencil to computer-based testing in 2004.

New candidates are curious about how the changes will affect their scores, and many CPAs who sat for the exam before 2004 want to know what’s different from the pre-computer-scoring days. This article provides an overview of the scoring process and answers some frequently asked questions.

Computer vs. Pencil

In the paper-and-pencil days, scoring was done by hand, which took several weeks, according to John Mattar, the AICPA Examinations Team’s director–Psychometrics & Research. Now, the examinations team writes software that evaluates answers based on an answer key a committee has agreed on, he said.

Essays likewise were scored by a room full of CPAs who were trained to score them. Now, the Examinations Team uses software to score them.

“If the gold standard is what a trained human scorer would score, you gather a relatively large sample—around 1,000 to 1,200 responses scored by people—then you use a program to build a mathematical model that will take elements of those papers and predict human scores and validate that model using data from real candidates and show the software is scoring the way the humans would score it,” Mattar said. “Now you have an approved scoring model and can run responses electronically through that software almost instantly and get scores.” However, even with the automated scoring, a sample of responses is also scored by people as a continuing quality-control check, he said.

The software looks for elements a human would score on, such as organization, development, and usage of language.

Because a new section of essays was introduced into the Business Environment and Concepts (BEC) section this year, initially, those essays will still have to be scored by humans, Mattar said. The Examinations Team will need to build computer models after it receives enough sample responses.

If a test taker’s total score is close to the passing score, the candidate’s written responses will be automatically regarded by human graders. When there is more than one grader for a response, the average of the scores is used as the final grade, he said.

How questions appear on tests is also different. In the past, there were different forms of the exam, but all Form A’s, for example, contained the same questions. Today’s system — multi-stage testing (MST) — allows the Examinations Team to target the exam to the ability of the candidate to get a more precise estimate of his or her proficiency, Mattar said.

When are easier or more difficult questions given?

Candidates take three multiple-choice testlets (groups of multiple-choice questions) per exam section. The first testlet is always a medium testlet. Those who perform well get a more difficult second testlet, while those who do not perform well receive a second medium difficulty testlet. Similarly, the third testlet can be a medium or a more difficult one and is based on performance on the first two testlets. Task-based simulation (TBS) questions are pre-assigned and are not chosen based on performance on the multiple-choice testlets. Exhibit 1 illustrates the process.

If you do poorly on the first testlet, you can still pass the exam, but you will need to do better on the second and third testlets.

You can get all medium testlets and still pass, but for this to happen, you would have to have good, but not excellent, performance on the first two testlets, and then excellent performance on the last testlet.

How Do You Decide Which Questions Are Difficult and Which Are Medium?

The difficulty levels of the test questions (and other statistics that are used to describe each test question) are determined through statistical analysis of candidate responses. At the question level, difficulty is not quantified as a category (for example, moderate or difficult), but as a numeric value along a scale. Testlets are classified as either medium or difficult based on the average difficulty of the questions within that testlet.

This article has been excerpted from the Journal of Accountancy. View the full article here.