Gaming out Hein citation metrics in a USNWR rankings system

Much has been said about the impending USNWR proposal to have a Hein-based citation ranking of law school faculties. I had some of my own thoughts here. But I wanted to focus on one of the most popular critiques, then move to gaming out how citations might look in a rankings system.

Many have noted how Hein might undercount individual faculty citations, either accidentally (e.g., misspellings of names) or intentionally (e.g., exclusion of many peer-reviewed journals from their database), along with intentional exclusion of certain faculty (e.g., excluding a very highly-cited legal research & writing faculty member who is not tenured or tenure-track).

The individual concerns are of concern, but not certainly for the reasons that they would tend to make the USNWR citation metrics less accurate. There are two reasons to be worried—non-random biases and law school administrative reactions—which I’ll get to in a moment.

Suppose USNWR said it was going to do a ranking of all faculty by mean and median height of tenured and tenure-track faculty. “But wait,” you might protest, “my faculty just hired a terrific 5’0” scholar!” or, “we have this terrific 6’4” clinician who isn’t tenure-track!” All true. But, doesn’t every faculty have those problems? If so, then the methodology doesn’t have a particular weakness in measuring all the law schools as a whole against one another. Emphasis on weaknesses related to individuals inside or outside the rankings ought to wash out across schools.

Ah, you say, but I have a different concern—that is, let’s say, our school has a high percentage of female faculty (much higher than the typical law school), and women tend to be shorter than men, so this ranking does skew against us. This is a real bias we should be concerned about.

So let’s focus on this first problem. Suppose your schools has a cohort of clinical or legal research & writing faculty on the tenure track, and they have lesser (if any) writing obligations; many schools do not have such faculty on the tenure track. Now we can identifying a problem in some schools suffering by the methodology of these rankings.

Another—suppose your schools had disproportionate numbers of faculty whose work appears in books or peer-reviewed journals that don’t appear in Hein. There might be good reasons for this. Or, there might not be. That’s another problem. But, again, just because one faculty member has a lot of peer-reviewed publications not in Hein doesn’t mean it’s a bad metric across schools. (Believe me, I feel it! My most-cited piece is a peer-reviewed piece, and I have several placements in peer-reviewed journals.)

Importantly, my colleague Professor Rob Anderson notes one virtue of the Sisk rankings that are not currently present in Hein citation counts: “The key here is ensuring that Hein and US News take into account citations TO interdisciplinary work FROM law reviews, not just citations TO law reviews FROM law reviews as it appears they might do. That would be too narrow. Sisk currently captures these interdisciplinary citations FROM law reviews, and it is important for Hein to do the same. The same applies to books.”

We simply don’t know (yet) whether these institutional biases exist or how they’ll play out. But I have a few preliminary graphics on this front.

It’s not clear how Hein will measure things. Sisk-Leiter measures things using a formula of 2*mean citations plus median citations. The USNWR metric may use mean citations plus median citations, plus publications. Who knows at this rate. (We’ll know in a few weeks!)

In the future, if (big if!) USNWR chooses to incorporate Hein citations into the rankings, it would likely do so by diminishing the value of the peer review score, which currently sits at a hefty 25% of the ranking and has been extraordinarily sticky. So it may be valuable to consider how citations relate to the peer score and what material differences we might observe.

Understanding that Sisk-Leiter is an approximation for Hein at this point, we can show the relationship between the top 70 or so schools in the Sisk-Leiter score (and a few schools we have estimates for at the bottom of the range), and the relationship of those schools to their peer scores.

This is a remarkably incomplete portrait for a few reasons, not the lease of which the trendline would change once we add 130 schools with scores lower than about 210 to the matrix. But very roughly we can see that the trends roughly correlate between peer score and Sisk-Leiter score, with a few outliers—those outperforming peer score via Sisk-Leiter above the line, those underperforming below the line.

But this is also an incomplete portrait for another reason. USNWR scales standardizes each score, which means they place the scores in relationship with one another before totalling them. That’s how they can add a figure like $80,000 of direct expenditures per student with a incoming class median LSAT score of 165. Done this way, we can see just how much impact changes (either natural improvement or attempts to manipulate the rankings) can have. This is emphatically the most important way to think about the change. Law school deans that see that citations are a part of the rankings and reorient themselves accordingly may well be chasing after the wind if costly school-specific changes have, at best, a marginal relationship to improving one’s overall USNWR score.

UPDATE: A careful and helpful reader pointed out that USNWR standardizes each score, but "rescales" only at the end. So the analysis below is simplified to take the standardized z-scores and rescaling them myself. This still allows us to make relative comparisons to each other, but it isn't the most precise way of thinking about the numerical impact at the end of the day. It makes it more readable--but less precise. Forgive me for my oversimplification and conflation!

Let’s take a look at how scaling currently works with the USNWR peer scores.

I took the peer review scores from the rankings released in March 2018 and scaled them on a 0-100 scale—the top score (4.8) became 100, and the bottom score (1.1) became 0.

As you can see, the scaling spreads out the schools a bit at the top, and it starts to compress them fairly significantly as we move down the list. I did a visualization of the distribution not long ago, but here you can see that an improvement in your peer score of 0.1 can nudge you up, but it won’t make up a lot of ground. That said, the schools are pretty well spread apart, and if you grinded it out, you could make some headway by climbing if you improved your peer score by 0.5 points—a nearly impossible feat. Coupled with the fact that this factor is a whopping 25% of the rankings, it offers opportunity if someone could figure out how to move. (Most schools just don’t move. The ones that do, tend to do so because of name changes.)

Now let’s compare that to a scaling of the Sisk-Leiter scores. I had to estimate the bottom 120 or so scores, distributing them down to a low end Sisk-Leiter score of 75. There’s a lot of guesswork, so this is extremely rough. (The schools around 70 or so have a Sisk-Leiter score of around 210.)

Yale’s 1474 becomes 100; the 75 I created for the bottom becomes 0. Note what happens here. Yale’s extraordinarily high citation count stretches out the field. That means lots of schools are crammed together near the bottom—consider Michigan (35), Northwestern (34), and Virginia (32). Two-thirds of schools are down in the bottom 10% of the scoring.

If you’re looking to gain ground in the USNWR rankings, and assuming the Hein citation rankings look anything like the Sisk-Leiter citation rankings, “gaming” the citation rankings is a terrible way of doing it. A score of about 215 puts you around 11; a score of around 350 puts you at 20. That’s the kind of dramatic movement of a 0.5 peer score improvement.

But let’s look at that 215 again. That’s about a 70 median citation count. Sisk-Leiter is over three years. So on a faculty of about 30, that’s about 700 faculty-wide citations per year. To get to 350, you’d need about a 90 median citation count, or increase to around 900 faculty-wide citations per year. Schools won’t be making marked improvements with the kinds of “gaming” strategies I outlined in an earlier post. They may do so through structural changes—hiring well-cited laterals and the like. But I am skeptical that any modest changes or gaming would have any impact.

There will undoubtedly be some modest advantages for schools that dramatically outperform their peer scores, and some modest injury for a few schools that dramatically underperform. But for most schools, the effect will be marginal at best.

That’s not do say that some schools will react inappropriately or with the wrong incentives to a new structure in the event that Hein citations are ultimately incorporated in the rankings.

But one more perspective. Let’s plug these Sisk-Leiter models into a USNWR model. Let’s suppose instead of peer scores being 25% of the rankings, peer scores become just 15% of the rankings and “citation rankings” become 10% of the rankings.

UPDATE: I have modified some of the figures in light of choosing to use the z-scores instead of adding the scaled components to each other.

When we do this, just about every school loses points relative to the peer score model—recall in the chart above, a lot of schools are bunched up in the 60-100 band of peer scores, but almost none are in the 60-100 band for Sisk-Leiter. Yale pushes all the schools downward in the citation rankings.

So, in a model of 15% peer score/10% citation rankings among the top 70 or so Sisk-Leiter schools, the typical school drops about 13 scaled points. That’s not important, however, for most of them—recall, they’re being compared to one another, so if most drop 17 points, then most should remain unchanged. And again, dropping 17 points is only 25% of the rankings—it’s really about a 4-point change in the overall rankings.

I then looked at the model to see which schools dropped 24 or more scaled points in this band (i.e., a material drop in score given that this only accounts for about 25% of the ranking): Boston College, Georgetown, Iowa, Michigan, Northwestern, Texas, Virginia, and Wisconsin. (More schools would likely fit this model once we had all 200 schools’ citation rankings. The University of Washington, for instance, would also probably be in this set.) But recall that for some of these schools—like Michigan and Northwestern—are already much higher than many other schools, so even dropping like this here would have little impact on the ordinal rankings of law school.

I then looked at the model to see which schools dropped 10 points or fewer (or gained!) (i.e., a material improvement in score): Chicago, Drexel, Florida International, George Mason, Harvard, Hofstra, Irvine, San Francisco, St. Thomas (MN), Toledo, and Yale. Recall again, Yale cannot improve beyond #1, and Harvard and Chicago are, again, so high in the rankings that marginal relative improvements in this area are likely not going to affect the ordinal rankings.

And all this means is that for the vast majority of schools, we’ll see little change—perhaps some randomness in rounding or year-to-year variations, but I don’t project for most schools much change at all.

Someone with more sophistication than I could then try to game out how these fit into the overall rankings. But that’s enough gaming for now. We’ll wait to see how the USNWR Hein citation figures come out this year, then we might play with the figures to see how they might affect the rankings.

(Note: to emphasize once again, I just use Sisk-Leiter. Hein will include, among other things, different citations, it may weigh differently thank Sisk-Leiter, it uses a different window, it may use different faculty, and the USNWR citation rankings may well include publications in addition to citations.)