Welcome to Regression Alert, your weekly guide to using regression to predict the future with uncanny accuracy.
For those who are new to the feature, here's the deal: every week, I dive into the topic of regression to the mean. Sometimes I'll explain what it really is, why you hear so much about it, and how you can harness its power for yourself. Sometimes I'll give some practical examples of regression at work.
In weeks where I'm giving practical examples, I will select a metric to focus on. I'll rank all players in the league according to that metric and separate the top players into Group A and the bottom players into Group B. I will verify that the players in Group A have outscored the players in Group B to that point in the season. And then I will predict that, by the magic of regression, Group B will outscore Group A going forward.
Crucially, I don't get to pick my samples (other than choosing which metric to focus on). If I'm looking at receivers and Cooper Kupp is one of the top performers in my sample, then Cooper Kupp goes into Group A and may the fantasy gods show mercy on my predictions.
Most importantly, because predictions mean nothing without accountability, I track the results of my predictions over the course of the season and highlight when they prove correct and also when they prove incorrect. At the end of last season, I provided a recap of the first half-decade of Regression Alert's predictions. The executive summary is we have a 32-7 lifetime record, which is an 82% success rate.
If you want even more details, here's a list of my predictions from 2020 and their final results. Here's the same list from 2019 and their final results, here's the list from 2018, and here's the list from 2017.
The Scorecard
In Week 2, I broke down what regression to the mean really is, what causes it, how we can benefit from it, and what the guiding philosophy of this column would be. No specific prediction was made.
In Week 3, I dove into the reasons why yards per carry is almost entirely noise, shared some research to that effect, and predicted that the sample of backs with lots of carries but a poor per-carry average would outrush the sample with fewer carries but more yards per carry.
In Week 4, I discussed the tendency for touchdowns to follow yards and predicted that players scoring a disproportionately high or low amount relative to their yardage total would see significant regression going forward.
In Week 5, I revisited an old finding that preseason ADP tells us as much about rest-of-year outcomes as fantasy production to date does, even a quarter of the way through a new season. No specific prediction was made.
In Week 6, I explained the concept of "face validity" and taught the "leaderboard test", my favorite quick-and-dirty way to tell how much a statistic is likely to regress. No specific prediction was made.
In Week 7, I talked about trends in average margin of victory and tried my hand at applying the concepts of regression to a statistic I'd never considered before, predicting that teams would win games by an average of between 9.0 and 10.5 points per game.
In Week 8, I lamented that interceptions weren't a bigger deal in fantasy football, given that they're a tremendously good regression target, and then I predicted interceptions would regress.
In Week 9, I explained why the single greatest weapon for regression to the mean is large sample sizes. For individual players, individual games, or individual weeks, regression might only be a 55/45 bet, but if you aggregate enough of those bets, it becomes a statistical certainty. No specific prediction was made.
In Week 10, I explored the link between regression and luck, noting that the more something was dependent on luck, the more it would regress, and predicted that "schedule luck" in the Scott Fish Bowl would therefore regress completely going forward.
In Week 11, I broke down the very important distinction between "mean reversion" (the tendency of players to perform around their "true talent level" going forward, regardless of how they have performed to date) and "gambler's fallacy" (the idea that overperformers or underperformers are "due" for a correction).
In Week 12, I talked about how much of a team's identity was really just random noise and small samples and projected that some of the most rush-heavy teams would skew substantially more pass-heavy going forward.
In Week 13, explained why the optimal "hit rate" isn't anywhere close to 100% and suggested that fantasy players should be willing to press even marginal edges if they want to win in the long run.
In Week 14, I sympathized with how tempting it is to assume that players on a hot streak can maintain that level of play but discussed how larger (full-season) samples were almost always more accurate. I predicted that the hottest players in fantasy would all cool down substantially toward their full-season averages.
In Week 15, I discussed several methods of estimating your championship odds and explained why virtually every team is more likely to lose than to win.
In Week 16, I examined what happened to some of our failed predictions if you looked at them over longer timespans and found that while regression could be deferred, the bill eventually came due.
In Week 17, we finally threw our dynasty friends a bone, looking at how situation tends to regress over time, which makes betting on underlying talent a much safer play.
STATISTIC FOR REGRESSION | PERFORMANCE BEFORE PREDICTION | PERFORMANCE SINCE PREDICTION | WEEKS REMAINING |
---|---|---|---|
Yards per Carry | Group A had 24% more rushing yards per game | Group B has 25% more rushing yards per game | None (Win!) |
Yards per Touchdown | Group A scored 3% more fantasy points per game | Group A has 12% more fantasy points per game | None (Loss) |
Margin of Victory | Average margins were 9.0 points per game | Average margins are 9.9 points per game | None (Win!) |
Defensive INTs | Group A had 65% more interceptions | Group B has 50% more interceptions | None (Win!) |
Schedule Luck | Group A had 38% more wins | Group A had 4% more wins | None (Loss*) |
Offensive Identity | Group A had 12% more rushing TDs | Group A had 4% more rushing TDs | None (Loss) |
"Hot" Players Regress | Players were performing at an elevated level | Players have regressed 132% to season avg. | None (Win!) |
Don't read too much into the fact that our sample of "hot" players slightly underperformed their season average over the last four weeks. In the three times I've run this prediction, they've regressed 108%, 75%, and 133% of the way back. Ignoring the small fluctuations from prediction to prediction, they've been scoring pretty much right on their full-season average across the larger sample.
But absolutely notice that their hot streak disappeared immediately and dramatically. Of the 23 players who remained in the sample, 12 (more than half) averaged below their full-season average, to say nothing of their "hot streak" average. Three more regressed at least 90% of the way back to their full-season average. Only two players improved upon their recent hot stretch (Dalton Schultz and Juwan Johnson-- certainly not the two we'd have expected!), and only four more (CeeDee Lamb, Isiah Pacheco, Chris Godwin, and Rachaad White) finished closer to their hot average than their full-season average.
You could remove the ten players who underperformed their "hot" average by the most, and the sample still would have regressed more than 66% toward their season average.
"Hot streaks" aren't really a thing. If a player has a few good games in a row, that should raise our expectations for him because it raises his average over the entire season, but we shouldn't expect them to maintain their strong play beyond that. Full-season data is almost always significantly more useful than any recent sub-samples.
Our Final Report Card
To wrap up the season, I wanted to look back not just at this year's predictions, but at every prediction since Regression Alert launched in 2017. Remember, I'm not picking individual players, I'm just identifying unstable statistics and predicting that the best and the worst players in those statistics will regress toward the mean, no matter who those best and worst players might be.
Sometimes this feels a bit scary. Predicting that stars like Davante Adams or Josh Jacobs, in the middle of league-winning seasons, are going to start falling off is an uncomfortable position. But looking back at our hit rate over time makes it a bit easier to swallow.
Top-line Record
- 2017: 6-2
- 2018: 5-1
- 2019: 7-2
- 2020: 6-1
- 2021: 8-1
- 2022: 4-3
- Overall: 36-10 (78%)
The Misses
2017 Passing Yards per Touchdown Part 1
2017 Passing Yards per Touchdown Part 2
In our first prediction, Group A was outscoring Group B by 13%. I picked a bad four-week span to make the prediction, as they outscored Group B by 17% over our prediction span, but over the full season that fell to just 3%; solid regression, but not enough to count the prediction as a win. When I repeated the prediction later in the season it once again went poorly. My takeaway from this experience was that quarterback yard-to-touchdown ratios were much more skill-based than running back or receiver ratios (an idea that's backed up by looking at the leaderboard in the statistic), so I've stopped making this prediction anymore.
2018 Yards per Target
Just like with the last miss, I tried to make a prediction out of a statistic that had a large skill element to it. Over the full season, Group A's edge fell from 16% to 7%, which was at least movement in the right direction, but not enough to qualify as a win. Once again, I've stopped trying to figure out clever ways to make this prediction work, because the skill signal is just too strong, which means the movement going forward tends to be far less dramatic and the prediction is a bit less reliable. (We did log one hit to offset this one miss before I discontinued the prediction.)
2019 Patrick Mahomes II Touchdown Regression
I knew going into this prediction that it wasn't a great bet; in fact, I preceded the prediction with 18 paragraphs and 4 charts detailing the three biggest issues with the prediction I was about to make, then compounded the issues by breaking best practices again to make a prediction about a single player rather than a large sample (where the ups and downs would have more chance to even out), and broke them a second time by specifically choosing my player rather than sticking with whoever happened to be most extreme in the statistic I was betting on regression. Then when the original prediction lost in part because Mahomes was injured during the sample, I doubled down when he returned from injury and ran it again; this prediction was responsible for both of my losses that year. Really just a disaster from start to finish with a pair of humbling and well-deserved losses to show for it.
2020 Point Differential vs. Record
I paired teams who had the same record despite wildly different point differentials and predicted that the teams that were winning by bigger margins would win more games going forward than the teams that were winning by smaller margins. Not only did that prediction not work out over the four-week sample, extending it out through the entire season didn't help any; our Group A teams have actually won one more game than our Group B teams since the prediction. The lesson I take away from this failure is... nothing. Sometimes predictions fail because I got greedy or made an ill-advised design choice. But sometimes we just get unlucky. In the future, I'd be happy to make this bet again.
2021 Kickers (Offense vs. Defense)
The point I wanted to make is that a team's offense predicted future performance more than the opposing team's defense. It's a point I've made in the past using offensive and defensive production directly, but this time I wanted to add a twist on it by focusing on "matchups". I think it's a sound point and would be happy to make the prediction again, but my mistake was focusing on kickers; matchups aren't a big deal in fantasy football, but the positions where they're the biggest deal are kicker and fantasy defense (which actually reinforces the underlying point that offense is more predictable than defense). If and when I run this back, I'll pick a different position to focus on.
2022 Yards per Touchdown
Nothing to learn from this loss; this was my 12th time making this particular prediction and it was bound to lose eventually. Extending the sample largely resolves any issues; since the prediction ended Group B has easily surpassed Group A as originally predicted.
2022 Schedule Luck
This is a loss with a good lesson behind it. I think the prediction was actually correct, but when I made it I noticed discrepancies in the data that shouldn't have been there. I even made a note of it at the time, writing "There are currently 35 teams with a winning record (10-8 or better) and an all-play percentage of 50% or worse. This is our Group A. On the other end, there are 269 teams with a losing record (8-10 or worse) and an all-play percentage greater than 50%. This is our Group B. I don't know why the second sample is so much larger than the first; this may again be a function of the Victory Point screwing with our data."
In theory, luck should be relatively symmetric, and Group A should have been approximately the same size as Group B. The fact that it wasn't was a red flag for me, but it should have been a bigger one. I think luck regressed, but my data source had bad data, and I couldn't catch it as a result. In the future, I'll work harder to make sure Regression Alert adheres more strongly to the principles of GIGO-- "Garbage In, Garbage Out".
2022 Offensive Identity
This was a prediction I'd made once before with great results (a 75% swing in offensive identity!), but this time for one reason or another it just didn't hit (a mere 8% swing). I don't think it was a terrible prediction. I think we probably just got unlucky. I might even try it again in the future. But if I do, I'll do a bit more research first to see if this is really something that regresses as much as I thing it should, or if we just got lucky the first time we tried this.
The Hits
Here's the outcome of all of my "Yards per Carry" predictions over the years, with the average at the time of the prediction, the average in the four weeks after the prediction, and the total swing.
- Group A had a 60% lead, Group B had a 16% lead, +76% total swing
- Group A had a 25% lead, Group B had a 16% lead, +41% total swing
- Group A had a 24% lead, Group B had a 4% lead, +28% total swing
- Group A had a 9% lead, Group B had a 23% lead, +32% total swing
- Group A had a 20% lead, Group B had a 30% lead, +50% total swing
- Group A had a 22% lead, Group B had a 23% lead, +45% total swing
- Group A had a 3% lead, Group B had a 36% lead, +39% total swing
- Group A had a 10% lead, Group B had a 4% lead, +14% total swing
- Group A had a 24% lead, Group B had a 25% lead, +49% total swing
We can't directly compare the total swings since the sample sizes vary so much (a 30% swing over a large sample might be more impressive than a 50% swing over a small one), but this prediction has gone 9-0 for me over the years with a median swing from Group A to Group B of of 39%. The minimum swing was 14%, but that was mostly just bad luck with the selected sample; over the full season, the swing would have been 25%. I've made a lot of jokes about yards per carry over the years. I've called it "pseudoscience" and said it's "not a thing" or even "maximally not a thing". Some people find these statements provocative, but they're not intended to provoke. Yards per carry genuinely is almost entirely noise, especially over the kinds of samples we're dealing with inside a single season. Here are the receipts.
Here's the outcome of all of my "Yard to Touchdown Ratio" predictions over the years. (Where necessary, I've reworked some of the predictions to adhere to our traditional "Group A vs. Group B" format. This is a purely cosmetic change for comparison; the underlying data remains untouched.)
- Group A had a 28% lead, Group B had a 1% lead, +29% total swing
- Group A had a 21% lead, Group B had an 8% lead, +29% total swing
- Group A had a 7% lead, Group B had a 20% lead, +27% total swing
- Group A had a 28% lead, Group B had a 23% lead, +51% total swing
- Group A had a 26% lead, Group B had a 4% lead, +30% total swing
- Group A had a 23% lead, Group B had a 47% lead, +70% total swing
- Group A had a 22% lead, Gorup B had a 23% lead, +45% total swing
- Group A had a 2% lead, Group B had a 40% lead, +42% total swing
- Group A had a 15% lead, Group B had an 11% lead, +26% total swing
- Group A had a 9% lead, Group B had a 13% lead, +22% total swing
- Group A had a 10% lead, Group B had a 19% lead, +29% total swing
- Group A had a 3% lead, Group A had a 12% lead, -9% total swing
I've said for years that yard-to-touchdown ratio is my favorite statistic precisely because it's such a reliable regression target, and no one else pays any attention to it. It's also the reason we're here today; this entire column was inspired by a pair of articles I wrote on this ratio back in 2015. Statistically-minded writers have known about the issues with yards per carry for decades, but when I started writing this column, virtually none of the discussion of players scoring "too many" or "too few" touchdowns linked that judgment to their yardage profile. But as you can see, that's exactly the link we should be making;
Our perfect record may have finally been brought low, but predictions that touchdowns will follow yards are still 11-1 with a median swing of 29%. 11 out of 12 predictions have produced a swing of at least 22%.
Here are the various other miscellaneous (successful) predictions from the past four seasons
- Group A had 16% more yards per target, Group B had 11% more yards per target, +27% total swing
- Group A had 17% fewer interceptions, Group B had 57% fewer interceptions, +74% total swing
- Group A had 13% fewer interceptions, Group B had 17% fewer interceptions, +30% total swing
- Group A had 20% more kicker points per game, Group B had 36% more kicker points per game, +56% total swing
- Group A had 42% more rushing TDs per game, Group A had 33% more passing TDs per game, +75% total swing
- Group A recorded 33% more interceptions, Group B recorded 24% more interceptions, +57% total swing
- Group A recorded 65% more interceptions, Group B recorded 50% more interceptions, +115% total swing
- Group A won 4% more fantasy matchups, Group B won 19% more fantasy matchups, +23% total swing
- Group A averaged 93% more yards per route run, Group B averaged 49% more yards per route run, +142% total swing
And general regression predictions that didn't follow the typical "Group A vs. Group B" format, instead predicting unidirectional regression for a single group.
- "Extreme" offenses and defenses regressed 11% toward the league average performance, as predicted.
- Defenses regressed 12% more than offenses, as predicted.
- Group A averaged 14% more passing yards per game, Group A continued to average 28% more passing yards per game, as predicted.
- "Hot" players regressed 108% of the way back to their full-season averages, as predicted.
- "Hot" players regressed 75% of the way back to their full-season averages, as predicted.
- "Hot" players regressed 133% of the way back to their full-season averages, as predicted.
- Margin of Victory settled between 9.0 and 10.5 points per game, as predicted.
That last prediction is a personal favorite of mine. I heard an anecdote about a statistic I had never looked at before (league-wide average margin of victory) and had absolutely no intuitions about. I spent 30 minutes looking up values from the last two years, and then, based on nothing more than my knowledge of how regression operates, I was able to predict within a narrow window where margin of victory would settle. (Lest you think the four-week sample was a fluke, margin of victory is at 10.1 points since my prediction.)
I didn't even know at the time how bold the prediction really was; in the league's 102-year history, there had never been a season with a margin of victory below 10.5, and there had only been four (1974, 1994, 1995, and 2016) with a margin below 11. Without even knowing why this year's margins had been so low (I still don't really know for sure!) I managed to predict a genuine outlier performance simply by understanding what kinds of patterns we tend to see in data. If you follow this column long enough, that's my goal for you, too; I want you to reach a point where you can look at data you've never seen before and make intuitive, accurate guesses about what kinds of values will follow.
Anyway, the whole point of this column is to convince you that regression to the mean is real, it's implacable, and it's actionable with very little effort on our own part. Accountability is crucial to making that point, which is why I go to such great lengths to track and report my results. You don't have to take my word on the subject; you can go back and check my track record for yourself. You can see why I'm such a big believer in the power of regression, and hopefully, you become something of a believer yourself.
As always, I appreciate you reading along this season, and look forward to doing it all over again in 2023.