Welcome to Regression Alert, your weekly guide to using regression to predict the future with uncanny accuracy.
For those who are new to the feature, here's the deal: every week, I dive into the topic of regression to the mean. Sometimes I'll explain what it really is, why you hear so much about it, and how you can harness its power for yourself. Sometimes I'll give some practical examples of regression at work.
In weeks where I'm giving practical examples, I will select a metric to focus on. I'll rank all players in the league according to that metric, and separate the top players into Group A and the bottom players into Group B. I will verify that the players in Group A have outscored the players in Group B to that point in the season. And then I will predict that, by the magic of regression, Group B will outscore Group A going forward.
Crucially, I don't get to pick my samples (other than choosing which metric to focus on). If the metric I'm focusing on is touchdown rate, and Christian McCaffrey is one of the high outliers in touchdown rate, then Christian McCaffrey goes into Group A and may the fantasy gods show mercy on my predictions.
Most importantly, because predictions mean nothing without accountability, I track the results of my predictions over the course of the season and highlight when they prove correct and also when they prove incorrect. Here's a list of my predictions from 2020 and their final results. Here's the same list from 2019 and their final results, here's the list from 2018, and here's the list from 2017. Over four seasons, I have made 30 specific predictions and 24 of them have proven correct, a hit rate of 80%.
The Scorecard
In Week 2, I broke down what regression to the mean really is, what causes it, how we can benefit from it, and what the guiding philosophy of this column would be. No specific prediction was made.
In Week 3, I dove into the reasons why yards per carry is almost entirely noise, shared some research to that effect, and predicted that the sample of backs with lots of carries but a poor per-carry average would outrush the sample with fewer carries but more yards per carry.
In Week 4, I talked about yard-to-touchdown ratios and why they were the most powerful regression target in football that absolutely no one talks about, then predicted that touchdowns were going to follow yards going forward (but the yards wouldn't follow back).
In Week 5, we looked at ten years worth of data to see whether early-season results better predicted rest-of-year performance than preseason ADP and we found that, while the exact details fluctuated from year to year, overall they did not. No specific prediction was made.
In Week 6, I taught a quick trick to tell how well a new statistic actually measures what you think it measures. No specific prediction was made.
In Week 7, I went over the process of finding a good statistic for regression and used team rushing vs. passing touchdowns as an example.
In Week 8, I talked about how interceptions were an unstable statistic for quarterbacks, but also for defenses.
In Week 9, we took a look at JaMarr Chase's season so far. He was outperforming his opportunities, which is not sustainable in the long term, but I offered a reminder that everyone regresses to a different mean, and the "true performance level" that Chase will trend towards over a long timeline is likely a lot higher than for most other receivers. No specific prediction was made.
In Week 10, I talked about how schedule luck in fantasy football was entirely driven by chance and, as such, should be completely random from one sample to the next. Then I checked Footballguys' staff leagues and predicted that the teams with the worst schedule luck would outperform the teams with the best schedule luck once that random element was removed from their favor.
In Week 11, I walked through how to tell the difference between regression to the mean and gambler's fallacy (which is essentially a belief in regression past the mean). No specific prediction was made.
In Week 12, I showed how to use the concept of regression to the mean to make predictions about the past and explained why the average fantasy teams were close but the average fantasy games were not. As a bonus, I threw in another quick prediction on touchdown over- and underachievers (based on yardage gained).
In Week 13, I went through the rabbit hole and investigated how performance in Regression Alert was also subject to regression to the mean, and how our current winning streak was unsustainable and destined to end sometime.
Statistic for regression | Performance before prediction | Performance since prediction | Weeks remaining |
---|---|---|---|
Yards per Carry | Group A had 10% more rushing yards per game | Group B has 4% more rushing yards per game | None (Win!) |
Yards per Touchdown | Group A scored 9% more fantasy points per game | Group B scored 13% more fantasy points per game | None (Win!) |
Passing vs. Rushing TDs | Group A scored 42% more RUSHING TDs | Group A is scoring 33% more PASSING TDs | None (Win!) |
Defensive Interceptions | Group A had 33% more interceptions | Group B had 24% more interceptions | None (Win!) |
Schedule Luck | Group A had a 3.7% better win% | Group B has an 18.5% better win% | None (Win!) |
Yards per Touchdown | Group A scored 10% more fantasy points per game | Group B has 8% more fantasy points per game | 2 |
Before our prediction, Group A had an all-play winning percentage of 40.3% and Group B had an all-play winning percentage of 60.4%. Since the prediction, Group A has an all-play winning percentage of 43.6% and Group B has an all-play winning percentage of 54.6%. So both groups regressed a little bit, but the bad teams largely stayed bad and the good teams largely stayed good.
One thing that didn't stay constant? Schedule luck. At the time of our prediction, Group A had won about 7 games more than they "should" have thanks to lucky scheduling. Group B had won about 6.5 games fewer than they "should" have. Since the prediction, Group A gained about 0.5 wins due to schedule luck, while Group B gained about 2.5. And since the luck wasn't diametrically opposed, Group B's underlying quality carried the day.
As for our yards per touchdown ratio, the elephant in the room here is this week's Patriots/Bills game. The Patriots attempted the fewest passes of any team since 1974, completing 2 of 3 for 19 yards passing. This... was sort of unfortunate since more than a quarter of the receivers in Group B right now play for the Patriots. Together, Jakobi Meyers and Kendrick Bourne accumulated 0 receptions on 0 targets. (They did manage 0.3 fantasy points thanks to a single 3-yard carry.)
If I were to mark this game as an outlier and remove it from the sample, Group B would lead Group A in fantasy points by 25% right now. But if the shoe were on the foot and it was Group A that got hosed by a historically bizarre game, I'd... well, I'd mark it as an outlier and remove it from the sample. But here at Regression Alert we play on hard mode. If massive outliers hurt our prediction, they stay in. If they help our prediction, they get pulled out.
Do Players Get Hot?
It's widely acknowledged that succeeding in the fantasy playoffs is largely about securing players who all "get hot" at the right time. But is "getting hot" a real, predictable phenomenon? Certainly, some players outscore other players in any given sample, but any time performance is randomly distributed you'd expect clusters of good games or clusters of bad games to occur by chance alone.
If a player has been putting up better games recently, does that indicate that he's "heating up" and will likely sustain that performance going forward? Or does it just mean that he just happened to string together a couple of good games, but you'd expect he'd be no more likely to do that again? The fantasy community often believes the former, but I'll venture that the truth is much closer to the latter.
Indeed, looking at how a player has performed over the last three, four, or five games is almost always worse than looking at how he's performed over the last nine, ten, or eleven games. As I keep saying around here, large samples are more predictable than small samples. Ignoring half or more of a player's games doesn't give you a better idea of how well that player will perform in the near future; it gives you a worse idea.
This is one of my favorite observations and I knew in advance that I'd be making predictions on it this week in preparation for the fantasy football playoffs, when all of the "hot" teams are riding high while the "cold" ones are starting to fret. And while I was preparing, one particular split caught my eye.
Yards per Route Run (full season):
— Adam Harstad (@AdamHarstad) December 9, 2021
2.30 - Ja'Marr Chase
1.75 - Elijah Moore
1.75 - Jaylen Waddle
1.67 - DeVonta Smith
Yards per Route Run (since Week 8):
2.59 - Elijah Moore
2.15 - Jaylen Waddle
1.97 - DeVonta Smith
1.07 - Ja'Marr Chase
That's dramatic. I'm sure there are a lot of GMs with Elijah Moore or Jaylen Waddle who are feeling like they just found a $100 bill lying in the middle of the street right now. And likewise, there are some teams with JaMarr Chase who are wondering if they should even bother starting him anymore. But history tells us that the Week 1-13 sample tells us more about who a player is than the Week 8-13 sample.
Oh sure, it's easy to craft a story to the contrary, especially with rookies (who, it should be noted, are the only class of players who genuinely average more points per game over the last half of the season than they did over the first half). But plausible stories are a drug that lull us into complacency and cause us to accept without questioning. So let's put this idea to the test.
For starters, let's take the Top 200 PPR scorers this season and strip away anyone who's played in fewer than 10 games over the full season or fewer than 3 games since Week 10. Finally, let's drop anyone who is scoring fewer than 10 points per game over the last four weeks. (Robby Anderson's production is up 27% recently, but I doubt anyone who has him on their roster would consider him "hot" by any stretch of the imagination.) That leaves us with 84 names.
Of those 84 names, 24 are averaging at least 25% more points per game over the last four weeks than they are over the season as a whole. These 24 players are: A.J. Dillon, Mark Ingram, Sony Michel, Javonte Williams, Jake Elliott, Elijah Moore, Devonta Freeman, Evan McPherson, Brandon Aiyuk, Amon-Ra St. Brown, Justin Jefferson, Zach Ertz, Leonard Fournette, Harrison Butker, Antonio Gibson, Van Jefferson, Darnell Mooney, Jonathan Taylor, Darrel Williams, Tee Higgins, Tony Pollard, Jaylen Waddle, and Kendrick Bourne.
Someone like A.J. Dillon seems like a safe bet for regression. His "hot streak" coincided with Aaron Jones missing time to injury, and now Aaron Jones is (presumably) back. On the other hand, someone like Javonte Williams seems like a safe bet to maintain his recent production with Melvin Gordon banged up. Collectively I'm assuming injuries to teammates wash out on average.
Anyway, if you for some reason played in a league that let you start 24 players a week (including three kickers and no quarterbacks), your team would be averaging 275.8 points per game over the full season, but a scorching 379.5 points per game over the last four weeks, an improvement of 37.6%. As you head into the playoffs, though, would you expect to score closer to 275.8 or 379.5? Are recent weeks a sign of things to come or just randomness being random again?
I'll wager that not only will that collection of players score closer to 275.8, their average over the next four weeks will be at least twice as close to their full-season average as it is to their last-four-games average. To reduce any unnecessary wonkiness, I'll exclude any player who doesn't play at least three games in the next four weeks (so that if, for example, Jonathan Taylor gets hurt on the first play of the game next week I don't get the benefit of counting his average as 0 points per game).
If every player meets the 3-game minimum, this means anything below 310 points per game will register as a win for me, while anything over that total counts as a loss.
And since the rookie wide receivers animated this prediction in the first place, and since JaMarr Chase is the "coldest" player in the sample, scoring just 60% of his full season average over recent weeks, and since we're on a winning streak and I resolved to take more risks, we'll do one last bonus prediction. I predict that JaMarr Chase will average more yards per route run over the next four weeks than Waddle, Moore, and Smith combined. (With the caveat that the prediction becomes null and void if Chase doesn't play at least three games.)