When the Task Force on the New York Bar Examination plagiarizes your work without attribution

UPDATE: The chair of the Task Force reached out to me with apologies and intends to update the report with attribution. I’ll link to that updated report when it’s available.

My blog isn’t much. It makes no money. It garners little attention. I don’t earn money consulting from it. It contains my half-baked musings, the best of which might become an article, the worst of which I strike through and hope people forget.

But at the very least, it would be nice to see my work acknowledged if it’s useful.

Sadly, the Task Force on the New Your Bar Examination found my work useful, but chose to copy without attribution.

Its recent report on the state of the bar exam takes large chunks of my blog and treats it as its own work product. Several paragraphs are lifted from my 2015 post, “No, the MBE was not ‘harder’ than usual.”

Here’s a part of my post:

The MBE uses a process known as "equating," then "scales" the test. These are technical statistical measures, but here's what it's designed to do. (Let me introduce an important caveat here: the explanations are grossly oversimplified but contain the most basic explanations of measurement!)


Standardized testing needs a way of accounting for this. So it does something called equating. It uses versions of questions from previous administrations of the exam, known as "anchor" questions or "equators." It then uses these anchor questions to compare the two different groups. One can tell if the second group performed better, worse, or similarly on the anchor questions, which allows you to compare groups over time. It then examines how the second group did on the new questions. It can then better evaluate performance on those new questions by scaling the score based on the performance on the anchor questions.

This is from Page 46 of the Task Force report:

The MBE also uses a process known as “equating,” which “scales” the test to adjust for differences between exams and by different test takers over time. Equating uses versions of questions from previous administrations of the exam, known as “anchor” questions or “equators” to compare two different groups. This way, in theory, one can tell if the second group performed better, worse, or similarly on the anchor questions, which allows groups of test takers to be compared across test administrations. Then, how the second group did on the new questions is examined so that performance on the new questions can be evaluated based on performance on the anchor questions.

Here’s another part of my post:

Consider two groups of similarly-situated test-takers, Group A and Group B. They each achieve the same score, 15 correct, on a batch of "equators." But Group A scores 21 correct on the unique questions, while Group B scores just 17 right.

We can feel fairly confident that Groups A and B are of similar ability. That's because they achieved the same score on the anchor questions, the equators that help us compare groups across test administrations.

And we can also feel fairly confident that Group B had a harder test than Group A. (Subject to a caveat discussed later in this part.) That's because we would expect Group B's scores to look like Group A's scores because they are of a similar capability. Because Group B performed worse on unique questions, it looks like they received a harder batch of questions.

The solution? We scale the answers so that Group B's 17 correct answers look like Group A's 21 correct answers. That accounts for the harder questions. Bar pass rates between Group A and Group B should look the same.

In short, then, it's irrelevant if Group B's test is harder. We'll adjust the results because we have a mechanism designed to account for variances in the difficulty of the test. Group B's pass rate will match Group A's pass rate because the equators establish that they are of similar ability.

When someone criticizes the MBE as being "harder," in order for that statement to have any relevance, that person must mean that it is "harder" in a way that caused lower scores; that is not the case in typical equating and scaling, as demonstrated in this example.

Let's instead look at a new group, Group C.

On the unique questions, Group C did worse than Group A (16 right as opposed to 21 right), much like Group B (17 to 21). But on the equators, the measure for comparing performance across tests, Group C also performed worse, 13 right instead of Group A's 15.

We can feel fairly confident, then, that Group C is of lesser ability than Group A. Their performance on the equators shows as much.

That also suggests that when Group C performed worse on unique questions than Group A, it was not because the questions were harder; it was because they were of lesser ability.

This is from pages 46-47 of the report:

Consider two groups of similarly-situated test-takers, Group A and Group B. They each achieve the same score, 15 correct, on a set of the “equator” questions. But Group A scores 21 correct on the unique questions, while Group B scores just 17 of these questions right. Based on Groups A and B’s same score on the equator questions, we can feel fairly certain that Groups A and B are of similar ability. We can also feel fairly certain that Group B had a harder test than Group A. This is because we would expect Group B’s scores to look like Group A’s scores because they are of a similar capability. Because Group B performed worse on unique questions, it looks like they received a harder group of questions. Now we scale the answers so that Group B’s 17 correct answers look like Group A’s 21 correct answers, thus accounting for the harder questions. Bar pass rates between Group A and Group B should then look the same. In short, it is irrelevant if Group B’s test is harder because the results will be adjusted to account for variances in test difficulty. Group B’s pass rate will match Group A’s pass rate because the equators establish that they are of similar ability.

Now consider Group C. In the unique questions, Group C did worse than Group A (16 right as opposed to 21 right), much like Group B (17 to 21). But on the equators, the measure for comparing performance across tests, Group C also performed worse, 13 right instead of Group A’s 15. We can feel fairly certain, then, that Group C is of lesser ability than Group A. Their performance on the equators shows as much. That also suggests that when Group C performed worse on unique questions than Group A, it was not because the questions were harder; it was because they were of lesser ability.

I don’t have particular comments on the rest of the report. I just highlight that my work was copied but never cited. I’m glad someone found it a little helpful. I’d be more glad if there was attribution.