Welcome to Regression Alert, your weekly guide to using regression to predict the future with uncanny accuracy.
For those who are new to the feature, here's the deal: every week, I dive into the topic of regression to the mean. Sometimes I'll explain what it really is, why you hear so much about it, and how you can harness its power for yourself. Sometimes I'll give some practical examples of regression at work.
In weeks where I'm giving practical examples, I will select a metric to focus on. I'll rank all players in the league according to that metric, and separate the top players into Group A and the bottom players into Group B. I will verify that the players in Group A have outscored the players in Group B to that point in the season. And then I will predict that, by the magic of regression, Group B will outscore Group A going forward.
Crucially, I don't get to pick my samples (other than choosing which metric to focus on). If the metric I'm focusing on is touchdown rate, and Christian McCaffrey is one of the high outliers in touchdown rate, then Christian McCaffrey goes into Group A and may the fantasy gods show mercy on my predictions.
Most importantly, because predictions mean nothing without accountability, I track the results of my predictions over the course of the season and highlight when they prove correct and also when they prove incorrect. Here's a list of my predictions from 2020 and their final results. Here's the same list from 2019 and their final results, here's the list from 2018, and here's the list from 2017. Over four seasons, I have made 30 specific predictions and 24 of them have proven correct, a hit rate of 80%.
The Scorecard
In Week 2, I broke down what regression to the mean really is, what causes it, how we can benefit from it, and what the guiding philosophy of this column would be. No specific prediction was made.
In Week 3, I dove into the reasons why yards per carry is almost entirely noise, shared some research to that effect, and predicted that the sample of backs with lots of carries but a poor per-carry average would outrush the sample with fewer carries but more yards per carry.
In Week 4, I talked about yard-to-touchdown ratios and why they were the most powerful regression target in football that absolutely no one talks about, then predicted that touchdowns were going to follow yards going forward (but the yards wouldn't follow back).
In Week 5, we looked at ten years worth of data to see whether early-season results better predicted rest-of-year performance than preseason ADP and we found that, while the exact details fluctuated from year to year, overall they did not. No specific prediction was made.
In Week 6, I taught a quick trick to tell how well a new statistic actually measures what you think it measures. No specific prediction was made.
In Week 7, I went over the process of finding a good statistic for regression and used team rushing vs. passing touchdowns as an example.
In Week 8, I talked about how interceptions were an unstable statistic for quarterbacks, but also for defenses.
Statistic for regression | Performance before prediction | Performance since prediction | Weeks remaining |
---|---|---|---|
Yards per Carry | Group A had 10% more rushing yards per game | Group B has 4% more rushing yards per game | None (Win!) |
Yards per Touchdown | Group A scored 9% more fantasy points per game | Group B scored 13% more fantasy points per game | None (Win!) |
Passing vs. Rushing TDs | Group A scored 42% more RUSHING TDs | Group A is scoring 71% more PASSING TDs | 2 |
Defensive Interceptions | Group A had 33% more interceptions | Group B has 30% more interceptions | 3 |
News that Derrick Henry will likely miss the rest of the season puts a damper on our prediction since the whole point of our passing vs. rushing touchdown prediction was betting against Derrick Henry continuing his absurd touchdown pace. But our prediction already paid off regardless of what happens the rest of the way with Tennessee. Two weeks ago, the Titans had scored 12 rushing touchdowns against just 6 passing touchdowns, but in the two weeks since they've scored 5 passing touchdowns and just 1 rushing touchdown (which came from Ryan Tannehill, anyway). In fact, most of the edge in passing touchdowns comes from the Titans; the rest of the sample combined has 7 passing touchdowns against 6 rushing touchdowns. Demonstrating that even the most dominant runner in the NFL isn't immune to the forces of regression.
As for our interception prediction, in the first week, our high-interception teams kept a higher interception rate than our low-interception teams, but since both groups regressed toward league average our low-interception teams won on total volume. Group A saw its interceptions per game fall from 1.29 to 1.00 while Group B's rose from 0.50 to 0.76.
Ja'Marr Chase and the First Two Rules of Regression to the Mean
JaMarr Chase is going to regress.
I know this because the first rule of regression to the mean is that everyone regresses to the mean. Ja'Marr Chase is coming off arguably the best first 8 games of any rookie in history, but his underlying metrics suggest he's been getting a bit lucky and performing over his head. He's currently 5th in fantasy points per game but just 27th in expected fantasy points per game, which is how many points a typical player would score given the same workload.
Leaders in xFP/g: pic.twitter.com/Cq1Gwtei1G
— Cooper Adams (@Cooper_DFF) November 2, 2021
Now, perhaps you think "but that's what a typical player would score on that workload and JaMarr Chase is not a typical player". Except overperformance and underperformance in expected points has been repeatedly shown to not be sustainable. Special players distinguish themselves by getting very valuable workloads, not by doing more with the workloads they get. For the best illustration, consider the three first-team AP All-Pro receivers from last year; Tyreek Hill has scored almost exactly as expected, Davante Adams has overperformed expectations by a very small amount, and Stefon Diggs has underperformed by a good bit. Even great players don't typically outperform their workload.
Or consider the Top 10 receivers in terms of overperforming expectations.
Largest Over and Underperformers Relative to xFP: pic.twitter.com/MMQ18QdSmr
— Cooper Adams (@Cooper_DFF) November 2, 2021
You have Ja'Marr Chase, DK Metcalf, Cooper Kupp, DeAndre Hopkins, and CeeDee Lamb, five of the best receivers in the league. You also have Deebo Samuel and Marquise Brown, who are generally considered good-but-not-great receivers, and Deonte Harris and Donovan Peoples-Jones, who might not be among the 50 best receivers in the league right now. That's about what you'd expect if overperformance and underperformance were largely random.
Over his first eight games, Chase has 38 receptions for 786 yards and 7 touchdowns, which means he's off to the best start of any receiver since Harlon Hill (who had 27/802/9 over his first eight games back in 1954). Because of this, many have moved Chase up to their #1 ranked dynasty receiver. Some have even made him the #1 ranked player in dynasty, regardless of position. Because these rankings are based in large part on a level of overperformance that is completely unsustainable, they must be overreactions, right?
Except, well, we can't discuss the first rule of regression to the mean without considering it in the context of the second rule.
- Everyone regresses to the mean.
- Everyone's mean is different.
Consider the receiver with the second-best 8-game start of the last sixty years: Marques Colston. Colston had 44 receptions for 700 yards and 7 touchdowns over his first eight games. His metrics were much more sustainable, too; he had 73 targets compared to 59 for Chase. His catch% was 60.3% compared to Chase's 64.4%, his yards per reception was 15.9 vs. Chase's 20.7, and his yards per target was a good-but-reasonable 9.6 compared to Chase's 13.3 (currently the 5th-highest value of any receiver with at least 50 targets since 1992).
Statistically, we had much more reason to expect that Marques Colston would be able to sustain his production than we do to suspect Ja'Marr Chase can. And yet as someone who was around back then, I can tell you Colston was never seriously considered as the #1 dynasty receiver, let alone the #1 overall player.
Why? Because everyone's mean is different. Colston's production might have been superficially sustainable, but he was a 7th round rookie from Hofstra. Ja'Marr Chase, on the other hand, was hailed by many as the best receiver prospect since Calvin Johnson in 2007. In other words, we have plenty of reason to expect Chase's "true mean" is much higher after 8 games than Colston's was.
This isn't meant as a knock on Colston, either. His scorching-hot start should have rightly caused us to reevaluate his prospects, and indeed he turned in a terrific career. We should be constantly updating our beliefs in the face of new evidence. How much we update must depend both on the strength of our beliefs and the strength of the new evidence.
Given two receivers off to scorching-hot starts, it's important to note which statistics are more or less sustainable, but it's also important to incorporate any other information we have to make the best guess as to just how high a player's true performance level might be. Ja'Marr Chase isn't going to break 6 tackles en route to an 82-yard touchdown again. He's probably not going to score like a Top 5 wide receiver the rest of the way.
But every indicator points to him being a star in the making, so it's unlikely he's going to finish as low as his volume to date would suggest. Footballguys' rest-of-year projectors have plenty of experience predicting regression, and yet they have Chase 9th going forward, far closer to his "unsustainable" 5th-place start than his much more pedestrian 27th-place workload.
Essentially, Chase perfectly illustrates that it's sometimes better to be lucky than good, but it's always better still to be both lucky and good.