Welcome to Regression Alert, your weekly guide to using regression to predict the future with uncanny accuracy.
For those who are new to the feature, here's the deal: every week, I dive into the topic of regression to the mean. Sometimes, I'll explain what it really is, why you hear so much about it, and how you can harness its power for yourself. Sometimes, I'll give some practical examples of regression at work.
In weeks where I'm giving practical examples, I will select a metric to focus on. I'll rank all players in the league according to that metric and separate the top players into Group A and the bottom players into Group B. I will verify that the players in Group A have outscored the players in Group B to that point in the season. And then I will predict that, by the magic of regression, Group B will outscore Group A going forward.
Crucially, I don't get to pick my samples (other than choosing which metric to focus on). If I'm looking at receivers and Justin Jefferson is one of the top performers in my sample, then Justin Jefferson goes into Group A, and may the fantasy gods show mercy on my predictions.
Most importantly, because predictions mean nothing without accountability, I report on all my results in real time and end each season with a summary. Here's a recap from last year detailing every prediction I made in 2022, along with all results from this column's six-year history (my predictions have gone 36-10, a 78% success rate). And here are similar roundups from 2021, 2020, 2019, 2018, and 2017.
The Scorecard
In Week 2, I broke down what regression to the mean really is, what causes it, how we can benefit from it, and what the guiding philosophy of this column would be. No specific prediction was made.
In Week 3, I dove into the reasons why yards per carry is almost entirely noise, shared some research to that effect, and predicted that the sample of backs with lots of carries but a poor per-carry average would outrush the sample with fewer carries but more yards per carry.
In Week 4, I explained that touchdowns follow yards, but yards don't follow touchdowns, and predicted that high-yardage, low-touchdown receivers were going to start scoring a lot more going forward.
In Week 5, we revisited one of my favorite findings. We know that early-season overperformers and early-season underperformers tend to regress, but every year, I test the data and confirm that preseason ADP is still as predictive as early-season results even through four weeks of the season. I sliced the sample in several new ways to see if we could find some split where early-season performance was more predictive than ADP, but I failed in all instances.
In Week 6, I talked about how when we're confronted with an unfamiliar statistic, checking the leaderboard can be a quick and easy way to guess how prone that statistic will be to regression.
STATISTIC FOR REGRESSION | PERFORMANCE BEFORE PREDICTION | PERFORMANCE SINCE PREDICTION | WEEKS REMAINING |
---|---|---|---|
Yards per Carry | Group A had 42% more rushing yards per game | Group A has 10% more rushing yards per game | None (Loss) |
Yard-to-TD Ratio | Group A had 7% more points per game | Group B has 48% more points per game | 1 |
It was looking for a moment like we might manage to salvage our yards per carry prediction. Group A had its worst week of the sample, averaging just 50 rushing yards on just 3.85 yards per carry. If Group B could simply maintain their average over the last three weeks (60.5 yards, 3.93 yards per carry) they would have completed an unlikely come-from-behind victory. Unfortunately, Group B also had their worst week of the season so far (42.2 yards, 3.72 yards per carry), and our perfect record on this prediction has finally come to an end.
I've always said that the streak would end eventually, and looking back, the things that brought it down aren't the least bit surprising. If yards per carry from one sample to the next is really random, eventually we'd expect the high-YPC group to maintain a high ypc and the low-YPC group to maintain a low ypc just by chance alone, and that's what we saw (4.66 ypc for Group A, 3.89 ypc for Group B).
Despite that, we still saw a 32% swing from Group A toward Group B, which is close to the median value we see on this prediction (39%), but there we bump into the second problem-- 43% is the second-largest lead Group A has had over Group B in our ten attempts at this prediction. Given a more typical edge (20-25%), a 32% swing would have been enough to flip the results. But a larger-than-typical starting gap paired with a smaller-than-typical yards per carry regression means we come up a hair short. All good things come to an end, but we'll revisit this later in the season and see if we can start a new winning streak.
At least our Yards Per Touchdown prediction keeps chugging along. After scoring a ridiculous 14 touchdowns in the first two weeks, our "low-touchdown" receivers were shut out of the end zone in Week 6... but outscored Group A on the week anyway, thanks to an 85.0 to 51.9 yard per game advantage.
At the time of the prediction Group A averaged 57.8 yards per game and Group B averaged 85.3. Since the prediction, Group A averages 53.0 yards per game and Group B averages 75.5. Yardage, as you can see, is fairly stable across samples. Touchdowns? Not so much. Group A has gone from one touchdown for every 75.6 yards to one touchdown for every 182.6 yards. Group B has gone from one touchdown for every 469 yards to one touchdown for every 161.7 yards. So not only is Group B gaining more yards, but they're also converting those yards into touchdowns at a slightly better rate (though both groups' averages are within the "sustainable band" I mentioned at the outset).
Must Historical Outliers Regress?
Last week, I wrote about how, when confronted with an unfamiliar statistic, we can often use our intuitions to make a fairly accurate guess of how much it will regress. This time last year, I followed that lesson up with a practical example, writing about how NFL games were closer than at nearly any point in history (as measured by margin of victory, at the time just 9.05 points per game). I then discussed how I had no idea what a "normal" margin of victory "should" look like and walked through the process of taking a completely new statistic and making estimates about how it was going to behave going forward. (I predicted margin of victory would settle between 9.0 and 10.5 over the ensuing four weeks, which was substantially below the 2021 average of 12.2 points per game. It wound up being 9.9.)
That was a fun experiment, so when I saw another example from the "This Season's Statistics Are Wildly Out Of Line With Recent Historical Averages" genre, I figured we could run it back.
Average Passing Yards Per Team, Per Game, By Year
— Russell Clay (@RussellJClay) October 18, 2023
(Through Week 6 of each year)
2018: 272.6
2019: 258.5
2020: 259.2
2021: 263.4
2022: 240.9
2023: 236.3
Hard to believe these are even real numbers.
This is very cool! (My idea of "cool" might differ slightly from yours.) Unlike last year's example, I've spent quite a lot of time looking at historical offensive trends, so this one doesn't come as a surprise to me. I knew passing yards per game was down significantly last year, and it certainly felt like it had gotten even lower this year. But is this a sign of things to come or just a 6-week fluke?
Whenever I want to guess how much something will regress, I always want to start with historical norms. Fortunately, Pro Football Reference is a handy one-stop shop for over 100 years of football statistics. We can navigate over to the page for the 2023 season and scroll down to the section on passing offenses to see that teams are averaging 218.6 yards per game.
(Why does this value differ from the numbers in the tweet above? Russell Clay is listing gross passing yardage, while Pro Football Reference lists net passing yardage, meaning yards lost on sacks are subtracted back out at the end. You can add the sack yardage per game from Pro Football Reference to the passing yardage per game and you'll get values closely approximating those in the tweet above; without knowing the source of Clay's numbers, I can't be sure why some discrepancy persists.)
Here are the same values since the turn of the century:
- 2022: 218.5
- 2021: 228.3
- 2020: 240.2
- 2019: 235.0
- 2018: 237.8
- 2017: 224.4
- 2016: 241.5
- 2015: 243.8
- 2014: 236.8
- 2013: 235.6
- 2012: 231.3
- 2011: 229.7
- 2010: 221.5
- 2009: 218.5
- 2008: 211.3
- 2007: 214.3
- 2006: 204.8
- 2005: 203.5
- 2004: 210.5
- 2003: 200.4
- 2002: 212.2
- 2001: 205.8
- 2000: 206.9
There's so much going on in this data; I could devote three columns to the history of the passing average over the last 25 years and still feel like I hadn't covered it all. But I want to call your attention to two things.
First, the averages from the last two years look like huge outliers... if we only compare to the past decade. From 2012-2021, teams averaged 235.5 passing yards per game, with more seasons over 240 (three) than under 230 (two, with a low of 224.4). 218.6 would be nearly 17 yards per game below that average. (Maybe 17 yards doesn't sound like that much, but when we're looking at a nearly 300-game sample, it's massive.)
But those "outliers" look pretty normal once we look back over a longer span. It's possible to imagine an NFL where teams average fewer than 220 passing yards per game-- we must simply remember any season from the 2000s.
It's very easy to take recent NFL trends and extrapolate out over a long distance into the future or past using a simple linear trend line. Passing had been rising over the last fifteen years, so we assume it will continue rising going forward. Likewise, we assume that the further back we go, the further passing must have fallen.
But that's not how the league operates. It tends to see very slow, methodical progress over a long timeline that breaks down into a series of trends and counter-trends over a short timeline. Here's passing yards per game for each decade of the modern era:
- 1960s: 182.3
- 1970s: 155.3
- 1980s: 201.8
- 1990s: 203.0
- 2000s: 207.0
- 2010s: 233.7
The long-term trend is clear... but it's also slow. There are two major deviations from that trend. In the 1970s, passing yards per game dropped precipitously, and offenses ground to a halt. (If you never knew why it was called the "dead ball era"... well, now you know.) In the 2010s, the opposite happened and passing offenses took a giant leap forward.
(On paper, the leap from the 2000s to 2010s-- 26.7 yards per game-- is very close to the fall from the 1960s to 1970s-- 27.0 yards per game. The values aren't quite as close as they first appear, though, since the large increase was in the same direction as the general trend, while the large decline cut against the trend. The 2010s might have overperformed the expected average by around 20 yards while the 1970s underperformed by closer to 35.)
Ordinarily, any statistic that falls well outside of recent historical averages would be a prime regression candidate. According to what we saw from 2010 to 2020, this year's passing yardage totals seem due to regress and regress hard. But knowing what I know about the path of progress, I think there's a very real chance that this is instead the early stages of a new and meaningful trend. (Aiding that interpretation: rushing efficiency has been on the rise over the last five years, painting the picture of a league that is finding new success running the ball against defenses built to stop the pass.)
In case you forgot, I mentioned that there were two things I wanted to call your attention to-- there's one other factor at play here. Notice in the tweet that inspired this investigation, passing yards per game were lower in 2023 than in 2022 (236.3 vs. 240.9). Now notice how, according to full-season statistics, the opposite was true; 2022 averaged lower passing yards per game than 2023 so far (218.5 vs. 218.6). What accounts for this discrepancy? Is it merely the sacks?
No, it's not; the tweet (smartly) only compared the first six weeks of the season, while I was looking at full-season data. This is relevant because passing yardage naturally declines in the second half of the season as the weather turns cold and winter arrives. Meaning even if passing offenses improved slightly going forward, those gains might still get swallowed up by worsening weather effects. (Passing offenses averaged 225.0 net yards per game over the first six weeks of 2022; that fell nearly ten yards to 215.1 from Week 7 on.)
We'll extend the usual four-week window and track this over the remainder of the year. Because of the two aforementioned factors (recent passing success is a historical aberration and passing offenses tend to naturally decline in the second half of the year), I'm going to go out on a limb and predict that 2023 will finish as the worst season for passing offense in over a decade. Formally: I predict that by the time we recap the season in Week 18, NFL offenses will be averaging 218.4 or fewer passing yards per game.
It doesn't feel good betting so heavily against the last decade, but it'd be cool if all that time studying offensive trends over the league's history actually proved useful for a change, so I'm excited to see how this one turns out.