Welcome to Regression Alert, your weekly guide to using regression to predict the future with uncanny accuracy.
For those who are new to the feature, here's the deal: every week, I dive into the topic of regression to the mean. Sometimes I'll explain what it really is, why you hear so much about it, and how you can harness its power for yourself. Sometimes, I'll give some practical examples of regression at work.
In weeks where I'm giving practical examples, I will select a metric to focus on. I'll rank all players in the league according to that metric and separate the top players into Group A and the bottom players into Group B. I will verify that the players in Group A have outscored the players in Group B to that point in the season. And then I will predict that, by the magic of regression, Group B will outscore Group A going forward.
Crucially, I don't get to pick my samples (other than choosing which metric to focus on). If I'm looking at receivers and Justin Jefferson is one of the top performers in my sample, then Justin Jefferson goes into Group A, and may the fantasy gods show mercy on my predictions.
Most importantly, because predictions mean nothing without accountability, I report on all my results in real time and end each season with a summary. Here's a recap from last year detailing every prediction I made in 2021 along with all results from this column's six-year history (my predictions have gone 36-10, a 78% success rate). And here are similar roundups from 2021, 2020, 2019, 2018, and 2017.
What Is Regression To The Mean
For our first article of the year, I think it's important to nail down exactly what regression to the mean is and why it is so powerful. I'd like to illustrate it with an example from basketball.
The free throw attempt might be the purest act in all of sports. There's no defense. There's no weather. The distance and angle never change. It is exactly the same every time: one player, one ball, one hoop, one shot.
For his career, Steph Curry shoots nearly 91% on free throws, but on a game-to-game level, there's a little bit of variance. Imagine that in Week 1 of the 2022-2023 season, Curry makes 3 of 6 free throws. (This would be a wildly uncharacteristic game, but it's not impossible; Curry once shot 1-of-4 and has twice gone 4-of-7.)
Nobody in their right mind would look at this game and conclude that Curry was suddenly a 50% free throw shooter, right? Instead, we'd think this game was an outlier and expect him to go back to hitting 91% the rest of the way. Because 91% is Curry's long-term mean (or average), and we expect him to regress (or return) to it.
Just like Steph Curry has an innate average free throw percentage, so does every player have an innate average talent level. And just as Curry's game-by-game results can deviate from that average, so can every player's results deviate from their own true mean. And just like we'd expect Curry to return to his average, we should expect all players to return to theirs, as well.
That's regression to the mean in a nutshell. It's a concept we all intuitively understand, even if we don't talk about it in so many words.
And if we're going to take advantage of regression to the mean, there are four guiding principles we need to keep in mind.
Principle #1: Everyone regresses to the mean.
Principle #2: Everyone's mean is different.
Eight players topped 100 receiving yards in Week 1: Tyreek Hill (215), Justin Jefferson (150), Brandon Aiyuk (129), Tutu Atwell (119), Puka Nacua (119), Chris Olave (112), Stefon Diggs (102), and Calvin Ridley (101).
All eight of these players are near-locks to average fewer yards per game going forward; there have only been 45 seasons in history where a player appeared in at least 10 games and averaged at least 100 yards per game, only 12 seasons where a player has topped 110 yards per game, and no one has ever averaged more than 125.
But just because all eight players will almost certainly regress doesn't mean all six players will regress the same amount. Jefferson, Hill, and Diggs are All Pros who already have a season averaging 106, 100, and 95 yards per game, respectively. Ridley has been out of the league for a while but averaged 91 yards per game back in 2020. Aiyuk and Olave are young former 1st-round picks who have been trending up. Atwell and Nacua were later draft picks who entered the weekend with 298 career yards between them.
It wouldn't be surprising if Jefferson finished the year averaging 100 yards per game (though it'd be a shock if he averaged 150), but for Atwell, even 50 yards per game would represent a dramatic improvement over preseason expectations. We can recognize that all of these players will regress while still having different expectations for them all going forward.
Principle #3: Regression by itself doesn't change player order.
Let's say I have two mystery running backs. Player A is averaging 20 points per game (or ppg), and Player B is averaging 18 ppg. I tell you that you can have your pick between them. Who do you choose?
Player A is certainly the bigger outlier. He's almost certainly going to regress more than Player B. But Player B is going to regress, as well, and unless we know something else about them, we have to assume that Player A will still be ahead afterward. Maybe they average 13 and 12 ppg going forward, but you still want the player who is scoring more today. Keep this in mind the next time you see someone merely point to a player's high fantasy point total and cry "regression".
Principle #4: Regression operates on multiple dimensions.
Our third principle tells us we can't just look at a statistic we care about (in this case, fantasy points) and apply the concept of regression directly. All that does is tell us that good players are likely to remain good— if slightly less so— and bad players are likely to remain bad— if also slightly less so.
But players are going to regress in several ways all at the same time. Matthew Stafford threw for 334 yards last weekend; since joining the Rams, he averages 270 yards per game, so that value is likely to come down over the rest of the year. On the other hand, Stafford didn't throw a single touchdown; he averages 1.9 per game with the Rams so that value will definitely come up. As Stafford's yardage comes down and his touchdowns come up, his overall production level will change relative to his peers.
Some dimensions are more stable than others. Rush attempts are much more predictable from week to week than yards-per-carry. Yardage totals vary a lot less than touchdown totals. By focusing on the secondary elements of a player's production that are most likely to regress, we can predict the ways that the overall list will change.
So, for instance, if we want to find players who will score fewer fantasy points, we might look at players who are scoring a lot of touchdowns right now. And if we want to find players who will score more fantasy points, maybe we look at players who have lots of targets but a low yard-per-target average.
By combining these principles, we can get one step ahead of our leaguemates. We can buy and sell tomorrow's production at today's prices and consistently reap a profit. All by simply understanding regression, how it works, and how we can put it to work for us.
Right now, all of this is in the abstract. Starting next week, I'll show you how to put it into practice. I'll show you how a simple list of players sorted from high to low can, with a little bit of discretion, become one of the most powerful buy-low, sell-high tools you'll ever find.