No, the MBE was not "harder" than usual

September 28, 2015 Derek T. Muller

I frequently read comments, on this site and others, commenting that the bar exam was simply harder than usual. Specifically, I read many people, often law faculty (who didn't take the exam this year) or recent graduates (the vast majority of whom are taking the bar exam for the first time), insisting that the bar, especially the Multistate Bar Exam ("MBE"), is "harder" than before.

Let's set aside, for now, and briefly, (1) rampant speculation, (2) cognitive biases suggesting that the instance in which someone is taking a multiple choice test that counts for something feels "harder" than ungraded practice, (3) erroneous comparisons between the MBE and bar prep companies, (4) retroactive fitting of negative bar results with negative bar experiences, or (5) the use of comparatives in the absence of a comparison.

Let's instead focus on whether the July 2015 bar exam was "harder" than usual. The answer is, in all likelihood, no--at least, almost assuredly, not in the way most are suggesting, i.e., that the MBE was harder in such a way that it resulted in lower bar passage rates.

I'll explain why this is the right question, and why I include the caveat "almost assuredly," below. First, it might be beneficial to take a moment to explain how the MBE is scored, and how that should factor into an analysis.

I. What the MBE scale is

Many are familiar with a "curved" exam, either from college or law school. The MBE is not curved. (For that matter, neither is the LSAT.)

In college, letter grades are commonly assigned based on converting numeric scores to a letter (e.g., 90-100 is an A, 80-89 is a B, etc., with some gradations for +'s and -'s). A common way of curving the exam is to add points to the top grade in the class to make it 100, and add the same number of points to everyone else's score. If the highest grade on the exam is a 92, then everyone gets an additional 8 points. If the highest grade is a 98, everyone gets an additional 2 points (and most classmates complain that this student "wrecked the curve"). This isn't really a "curve" in the typical use of the term, but it's a common way of distributing grades.

Instead, most law schools "curve" grades based on a pre-determined distribution of grades. Consider the University of California-Irvine. In a class of, say, 80 students, instructors are required to give 3 or 4 A+'s; 19%-23% of the next highest grades are A's; 19-23% of the next highest grades are A-'s; and so on.

But the MBE uses neither of these. The MBE uses a process known as "equating," then "scales" the test. These are technical statistical measures, but here's what it's designed to do. (Let me introduce an important caveat here: the explanations are grossly oversimplified but contain the most basic explanations of measurement!)

Imagine we have two groups of students. They are taking a test, but on different days. And we don't want to give them the exact same test, because, well, that's a bad idea--the second group might get answers from the first group. But we want to be able to compare the two groups of students to each other.

It wouldn't really do to use our law school "curve" above. After all, what if the second group is much smarter than the first group? If we, say, had a 75% pass rate, why should the second group be penalized for taking the test among a much smarter group, when their chances would have been better the first time around?

Standardized testing needs a way of accounting for this. So it does something called equating. It uses versions of questions from previous administrations of the exam, known as "anchor" questions or "equators." It then uses these anchor questions to compare the two different groups. One can tell if the second group performed better, worse, or similarly on the anchor questions, which allows you to compare groups over time. It then examines how the second group did on the new questions. It can then better evaluate performance on those new questions by scaling the score based on the performance on the anchor questions.

This is why the bar jealously guards its exam questions and why there is such tight security around the exam. It needs some of the questions to compare groups from year to year. But as the law changes, or simply to keep the test relatively fresh, there are always new questions introduced into the exam.

II. How the MBE scale works

It's one thing to read the math--yes, you might think, there's some magic that standardized test administrators have, but it's still a challenge to understand. How does it work?

Consider two groups of similarly-situated test-takers, Group A and Group B. They each achieve the same score, 15 correct, on a batch of "equators." But Group A scores 21 correct on the unique questions, while Group B scores just 17 right.

We can feel fairly confident that Groups A and B are of similar ability. That's because they achieved the same score on the anchor questions, the equators that help us compare groups across test administrations.

And we can also feel fairly confident that Group B had a harder test than Group A. (Subject to a caveat discussed later in this part.) That's because we would expect Group B's scores to look like Group A's scores because they are of a similar capability. Because Group B performed worse on unique questions, it looks like they received a harder batch of questions.

The solution? We scale the answers so that Group B's 17 correct answers look like Group A's 21 correct answers. That accounts for the harder questions. Bar pass rates between Group A and Group B should look the same.

In short, then, it's irrelevant if Group B's test is harder. We'll adjust the results because we have a mechanism designed to account for variances in the difficulty of the test. Group B's pass rate will match Group A's pass rate because the equators establish that they are of similar ability.

When someone criticizes the MBE as being "harder," in order for that statement to have any relevance, that person must mean that it is "harder" in a way that caused lower scores; that is not the case in typical equating and scaling, as demonstrated in this example.

Let's instead look at a new group, Group C.

On the unique questions, Group C did worse than Group A (16 right as opposed to 21 right), much like Group B (17 to 21). But on the equators, the measure for comparing performance across tests, Group C also performed worse, 13 right instead of Group A's 15.

We can feel fairly confident, then, that Group C is of lesser ability than Group A. Their performance on the equators shows as much.

That also suggests that when Group C performed worse on unique questions than Group A, it was not because the questions were harder; it was because they were of lesser ability.

There are, of course, many more nuanced ways of measuring how different the groups are, examining the performance of individuals on each question, and so on. (For instance, what if Group C also got harder questions by an objective measure--as in, Group A would have scored the same score as Group C on the uniques if Group A answered Group C's uniques? How can we examine the unique questions independent of the equators, in the event that the uniques are actually harder or easier?) But this is a very crude way of identifying what the bar exam does. (For all of the sophisticated details, including how to weigh these things more specifically, read up on Item Response Theory.)

So, when the MBE scores decline, they decline because the group, as a whole, has performed worse than the previous group. And we can measure that by comparing their performance on similarly-situated questions.

III. Did something change test-taker performance on the MBE?

The only way to say that the failure rate increased because of the test would be because of a problem with the test itself. It might be, of course, that the NCBE created an error in its exam by using the wrong questions or scoring it incorrectly, but there hasn't been such an allegation and we have no evidence of that. (Of course, we have little evidence of anything at all, which may be an independent problem.)

But this year, some note, is the first year at the MBE includes seven multistate bar exam subjects instead of six. Civil Procedure was added as a seventh subject in the February 2015 exam.

Recall how equating works, however. We equate similar questions given to the same groups of test-takers. That means, Civil Procedure questions were not used to equate the scores. If the Civil Procedure questions were more challenging, or if students performed worse on those questions than others, we can always go back and see how they did on the anchor questions. Consider Groups A & B above: if they are similarly skilled test-takers, but Group B suffered worse scores on the uniques because of some defect in the Civil Procedure questions, then scaling will cure those differences, and Group B's scores will be scaled to reflect similar scores to Group A.

Instead, a "change" in the bar pass rates derived from the exam itself must affect how one performs on both the equators and the uniques.

The more nuanced claim is this: Civil Procedure is a seventh subject on the bar exam. Law students must now learn an additional subject for the MBE. Students have limited time and limited mental capacity to learn and retain this information. The seventh subject leads them to perform worse than they would have on the other six subjects. That, in turn, causes a decline in their performance on the equators relative to previous cohorts. And that causes an artificial decline in the score.

Maybe. But there are several factors suggesting (note, not necessarily definitively!) this is not the case.

First, this is not the first time the MBE has added a subject. In the mid-1970s, the 200-question MBE consisted of just five subjects: Contracts, Criminal Law, Evidence, Property, and Torts. (As a brief note, some of these subjects may have become easier; indeed, the adoption of the Federal Rules of Evidence simplified the questions at a time when the bulk of the evidentiary questions were based on common law. But, of course, there is more law today in many of these areas, and perhaps more complexity in some of them as a result.)

By the mid-1970s, the NCBE considered adding some combination of Civil Procedure, Constitutional Law, and Corporations to the MBE set. It ultimately settled on Constitutional Law, but not after significant opposition. Indeed, some even went so far as to suggest that it was not possible to draft objective-style multiple choice questions on Constitutional Law. (I've read through the archives of the Bar Examiner magazine from those days.) Nevertheless, the MBE plunged ahead and added a sixth subject in Constitutional Law. There was no great outcry about changes in bar pass rates or inabilities of students to handle a sixth subject; there was no dramatic decline in scores. Instead, Constitutional Law was, and now is, deemed a perfectly ordinary part of the MBE, with no complaints that the addition of this sixth subject proved overwhelming. The practice of adding a subject to the MBE is not unprecedented.

Furthermore, it's an overstatement to say that the MBE now includes a seventh subject when all bar exams (to my knowledge) previously tested Civil Procedure. Yes, the testing occurred in the essay components rather than the multiple choice components, but students were already studying for it, at least somewhat, anyway. And in many jurisdictions (e.g., California), it was federal Civil Procedure that was tested, not state-specific.

Finally, Civil Procedure is a substantial required course at (I believe) every law school in America--the same cannot be said, at the very least, of Evidence and of Constitutional Law. To the extent it's something students need to learn, they are, generally, already required to learn it for the bar, and they already have learned it in law school. (Retention or comprehension, of course, are other matters.)

These arguments are not definitive. It may well be the case that they are wrong and that Civil Procedure is a kind of disruptive subject sui generis. But it points to a larger issue that such arguments are largely speculation, ones that require more evidence before gaining confidence that Civil Procedure is (or is not) responsible, in any meaningful measure, to the lower MBE scores.

We do, however, have two external factors that predict the decline in MBE scores, and ones that suggest that the decline in student quality rather than a more challenging set of subjects is responsible for the decline in scores. First, law schools have been increasingly admitting--and, subsequently, increasingly graduating--students with lower credentials, including lower undergraduate grade point averages and lower LSAT scores. Jerry Organ has written extensively about this. We should expect declining bar pass rates as law schools continue to admit, and graduate, these students. (The degree of decline remains a subject of some debate, but a decline is to be expected.)

Second, the NCBE has observed a decline in MPRE scores. Early and more detailed responses from the NCBE revealed a relatively high correlation between MPRE and MBE scores. And because the MPRE is not subject to the same worries about changes in subject matter tested, its predictive value is beneficial to examine whether one would expect a particular outcome in the MBE.

IV. Some states are making the bar harder to pass--by raising the score needed to pass

Illinois, for instance, has announced that it will increase the score needed to pass the exam. When it adopted the Uniform Bar Exam, Montana decided to increase the score needed to pass.

These factors are unrelated to the changes in MBE scores. We might expect pass rates to decline. And we might attribute that decline to something other than the MBE scores.

And it actually raises a number of questions for those jurisdictions. Why is the pass score being increased? Why did the generation of lawyers who passed that bar and who were deemed competent, presumably, to practice law conclude that they needed to make it a greater challenge for Millennials? Is there evidence that a particular bar score is more or less effective at, say, excluding incompetent attorneys, or minimizing malpractice?

These are a few of the questions one might ask about why one may want a bar exam at all--its function, or its role as gatekeeper, and so on. And it's a question about the difficulty of passing the bar, which is a distinct inquiry from the question about the difficulty of the MBE questions themselves.

V. Concluding thoughts

Despite some hesitation or tentative conclusions offered, I'll restate something I began with: "Let's instead focus on whether the July 2015 bar exam was 'harder' than usual. The answer is, in all likelihood, no--at least, almost assuredly, not in the way most are suggesting, i.e., that the MBE was harder in such a way that it resulted in lower bar passage rates."

We can see that the MBE uses Item Response Theory to account for variances in the test difficulty, and the NCBE scales scores to ensure that harder or easier questions do not affect the outcome of the test. We can also see that merely adding a new subject, by itself, would not decrease scores. Instead, something would have to affect test-takers ability to an extent that it would make them perform worse on similar questions. And we have some good reasons to think (but, admittedly, not definitively, at least not yet) that Civil Procedure was not that cause; and some good reasons (from declining law school admissions standards on LSAT scores and UGPAs, and MPRE scores) to think that the decline is more related to the test-takers ability. More evidence and study is surely needed to sharpen the issues, but this post should clear up several points about MBE practice (in, admittedly, deeply, perhaps overly, simple terms).

Law schools ignore this to their peril. Blaming the exam without an understanding of how it actually operates masks the major structural issues confronting schools in their admissions and graduation policies. And it is almost assuredly going to get worse over each of the next three July administrations of the bar exam.