Regression Alert: Week 7

Adam Harstad's Regression Alert: Week 7 Adam Harstad Published 10/19/2022

Welcome to Regression Alert, your weekly guide to using regression to predict the future with uncanny accuracy.

For those who are new to the feature, here's the deal: every week, I dive into the topic of regression to the mean. Sometimes I'll explain what it really is, why you hear so much about it, and how you can harness its power for yourself. Sometimes I'll give some practical examples of regression at work.

In weeks where I'm giving practical examples, I will select a metric to focus on. I'll rank all players in the league according to that metric and separate the top players into Group A and the bottom players into Group B. I will verify that the players in Group A have outscored the players in Group B to that point in the season. And then I will predict that, by the magic of regression, Group B will outscore Group A going forward.

Crucially, I don't get to pick my samples (other than choosing which metric to focus on). If I'm looking at receivers and Cooper Kupp is one of the top performers in my sample, then Cooper Kupp goes into Group A and may the fantasy gods show mercy on my predictions.

Most importantly, because predictions mean nothing without accountability, I track the results of my predictions over the course of the season and highlight when they prove correct and also when they prove incorrect. At the end of last season, I provided a recap of the first half-decade of Regression Alert's predictions. The executive summary is we have a 32-7 lifetime record, which is an 82% success rate.

If you want even more details here's a list of my predictions from 2020 and their final results. Here's the same list from 2019 and their final results, here's the list from 2018, and here's the list from 2017.


The Scorecard

In Week 2, I broke down what regression to the mean really is, what causes it, how we can benefit from it, and what the guiding philosophy of this column would be. No specific prediction was made.

In Week 3, I dove into the reasons why yards per carry is almost entirely noise, shared some research to that effect, and predicted that the sample of backs with lots of carries but a poor per-carry average would outrush the sample with fewer carries but more yards per carry.

In Week 4 I discussed the tendency for touchdowns to follow yards and predicted that players scoring a disproportionately high or low amount relative to their yardage total would see significant regression going forward.

In Week 5, I revisited an old finding that preseason ADP tells us as much about rest-of-year outcomes as fantasy production to date does, even a quarter of the way through a new season. No specific prediction was made.

In Week 6, I explained the concept of "face validity" and taught the "leaderboard test", my favorite quick-and-dirty way to tell how much a statistic is likely to regress. No specific prediction was made.

STATISTIC FOR REGRESSION PERFORMANCE BEFORE PREDICTION PERFORMANCE SINCE PREDICTION WEEKS REMAINING
Yards per Carry Group A had 24% more rushing yards per game Group B has 25% more rushing yards per game None (Win!)
Yards per Touchdown Group A scored 3% more fantasy points per game Group A has 6% more fantasy points per game 1

We've covered the journey of our yards per carry prediction well enough that all that's left is to announce the final score. Our "high-ypc" backs fell from 6.41 to 4.28 yards per carry while our "low-ypc" backs rose from 3.81 to 4.53 yards per carry. League average yards per carry among running backs, for context, is 4.47, so both groups have essentially been average (with our "bad" backs slightly above average and our "good" backs slightly below). This isn't over a small, fluky sample size; Group B has logged 583 carries since our prediction, or a strong two seasons worth of work. (Due to injuries, Group A has "only" logged 320 carries, which would still be one massive single-season workload.) Yards per carry is not-- in any sense that matters for our purposes here-- "a thing".

Meanwhile, our Yard-to-Touchdown prediction fared a bit better; both groups had bad weeks, but Group B's week was marginally less bad, and as a result, they moved us closer to a "flip". Again, Group B needs not just to outscore Group A, but to do so by at least 10%. With one week to go, it's very unlikely that they'll manage to salvage a win.


Testing Our Intuitions Regarding Regression

While preparing for the column this week I saw a tweet that immediately sparked my interest.

Now, hopefully, I've taught enough about regression that your first thought on reading "this is the most extreme X in 90 years" should be, "I bet X is going to regress toward historical averages going forward". That was my first thought, too. But how much of its outlier status is just randomness being especially random this year, and how much is due to actual structural realities in the league today?

I'm fairly certain I've never considered "average margin of victory" as a standalone stat before, in much the same way that I've never considered "average temperature at kickoff" or "average miles traveled by the visiting team". As a factor in individual games, sure, I've given thought to games that were especially close or especially cold or where teams had to travel especially far. But as an aggregate, leaguewide measure it was entirely new, so I didn't have any preconceived notions about what it "should" be or what factors might drive it higher or lower. Which makes this a really good test for my intuitions, which are coming in entirely naive.

Already a subscriber?

Continue reading this content with a PRO subscription.

The first thing I wanted to determine was how "likely" it would be that we'd see data like this just from random variation. So to that end, I recorded the margin of every single game since the beginning of 2021 so I could break it down in various ways. I confirmed that, yes, games are indeed significantly closer; the average game last year was decided by 12.16 points, while games this year have an average margin of 9.05 points. (I'm not sure why my number differs from the number in the tweet above; it's possible I've entered some data wrong, it's possible he's made a mistake, and it's possible that we're analyzing the data in different ways. Either way, the data I have is close enough for me to feel comfortable using it still.)

But 2021 was a full season, and 2022 so far is just six weeks. How do this year's games compare to the "closest" 6-week stretch from last year? The "closest" stretch last year was (perhaps coincidentally) also Weeks 1-6, with an average margin of 11.23 points, still substantially higher than this year's. In fact, even if you took the six lowest-margin weeks of the season (Weeks 2, 4, 5, 8, 9, and 15), the average margin was still about 9.88 points, substantially higher than this year's first six weeks (which may or may not be the six closest weeks of the season-- in fact, they are almost certainly not).

Average scoring margin is primarily driven by infrequent blowouts; if 90% of games were decided by 7 points and the other 10% of games were decided by 50 points, the average margin of victory would be 11.3 points. Since blowouts are rare, you'd expect some samples to contain more of them and others to contain fewer, which would have a big impact on margin of victory. If you had just 5% blowouts in a sample, the average margin would fall to 9.15 points; if you had 15% blowouts, the average margin would rise to 13.45.

Is that what's going on here? One way to check is to compare medians instead of averages. The median is just the midpoint of the data, with 50% of games featuring a larger margin and 50% featuring a smaller margin. In outlier-skewed statistics like this, you'd expect the median to be smaller than the average (or mean), and that's what we see; the median game was decided by 10 points last year. But last year's median was bigger still than this year's mean, which means this year's median must be even more extreme. Indeed, the median game so far in 2022 is decided by just 7 points.

(As an aside: it's not a coincidence that the medians are both "significant" numbers for football in 7 and 10, the value of a touchdown and a touchdown plus a field goal. The "mode" or "most common" margin in both seasons was 3 points, and given the fact that our data is skewed towards high outliers, it's unsurprising that this value is both lower still than the median and again a "significant" number, in this case the value of a field goal.)

Looking at each week individually, the lowest-margin week of 2021 was Week 9, with the average game decided by 9.42 points. Four out of six weeks so far this season have seen smaller average margins, and the two exceptions (10.93 points in Week 2 and 10.5 points in Week 4) would both still rank below 11 out of 18 weeks last year.

So while I would expect margins to regress going forward (simply because anything that is "historically small" is almost certainly unsustainable), a cursory look at the data so far makes it abundantly clear that the small margins in 2022 are not a fluke. They're not the result of a few outlier games (or the lack of a few outlier games); they're not being driven by one or two outlier weeks either this year or last. The shift towards closer games, while massive, has been consistent and is likely "real".

So now it's time for the prediction. Again, I know nothing about "average margin of victory", a stat I had never even considered until a day or two ago. But just based on what I've seen, I would expect it to regress in the direction of last year's values. Usually, when I make an "X will regress in the direction of Y" prediction, I additionally predict that it will finish closer to Y than to X, but in this case, I think otherwise. I think there's a lot of very real signal in the close games through six weeks, and I expect that going forward, margins will remain closer to their 2022 values than their 2021 values.

In other words, I predict that over the next four weeks, the average margin of victory in NFL games will be between 9.0 and 10.5 points. I'm not especially comfortable with this prediction. I think in an outlier-driven stat like average margin, "low" predictions are especially vulnerable in small samples. There were two weeks with 17-point margins last year, and if even one of the next four weeks matches that, the remaining three weeks will need to average 8.3-point margins to offset. This would be a much better prediction to track over the entire rest of the season to help protect it from outliers.

But the format of this column is clear; I follow the data and make the prediction it leads me to, no matter how uncomfortable it may be. So that's the prediction we're rolling with, and may the Football Gods continue to reward us with close matchups over the coming month. Not just for the sake of our prediction but because it makes for compelling football.

Photos provided by Imagn Images

More by Adam Harstad

 

Dynasty, in Theory: Do the Playoffs Matter?

Adam Harstad

Should we include playoff performances when evaluating players?

01/18/25 Read More
 

Odds and Ends: Divisional Round

Adam Harstad

Examining past trends to predict the future.

01/17/25 Read More
 

Odds and Ends: Wild Card Weekend

Adam Harstad

Examining the playoff futures and correctly predicting the Super Bowl winner.

01/10/25 Read More
 

Dynasty, in Theory: Evaluating Rookie Receivers

Adam Harstad

Revisiting this year's rookies through the lens of the model

01/09/25 Read More
 

Dynasty, in Theory: Consistency is a Myth

Adam Harstad

Some believe consistency helps you win. (It doesn't.)

01/04/25 Read More
 

Odds and Ends: Week 18

Adam Harstad

How did we do for the year? Surprisingly well!

01/02/25 Read More