Welcome to Regression Alert, your weekly guide to using regression to predict the future with uncanny accuracy.
For those who are new to the feature, here's the deal: every week, I dive into the topic of regression to the mean. Sometimes I'll explain what it really is, why you hear so much about it, and how you can harness its power for yourself. Sometimes I'll give some practical examples of regression at work.
In weeks where I'm giving practical examples, I will select a metric to focus on. I'll rank all players in the league according to that metric, and separate the top players into Group A and the bottom players into Group B. I will verify that the players in Group A have outscored the players in Group B to that point in the season. And then I will predict that, by the magic of regression, Group B will outscore Group A going forward.
Crucially, I don't get to pick my samples (other than choosing which metric to focus on). If the metric I'm focusing on is touchdown rate, and Christian McCaffrey is one of the high outliers in touchdown rate, then Christian McCaffrey goes into Group A and may the fantasy gods show mercy on my predictions.
Most importantly, because predictions mean nothing without accountability, I track the results of my predictions over the course of the season and highlight when they prove correct and also when they prove incorrect. Here's a list of my predictions from 2019 and their final results, here's the list from 2018, and here's the list from 2017.
THE SCORECARD
In Week 2, I opened with a primer on what regression to the mean was, how it worked, and how we would use it to our advantage. No specific prediction was made.
In Week 3, I dove into the reasons why yards per carry is almost entirely noise, shared some research to that effect, and predicted that the sample of backs with lots of carries but a poor per-carry average would outrush the sample with fewer carries but more yards per carry.
In Week 4, I talked about how the ability to convert yards into touchdowns was most certainly a skill, but it was a skill that operated within a fairly narrow and clearly-defined range, and any values outside of that range were probably just random noise and therefore due to regress. I predicted that high-yardage, low-touchdown receivers would outscore low-yardage, high-touchdown receivers going forward.
In Week 5, I talked about how historical patterns suggested we had just reached the informational tipping point, the time when performance to this point in the season carried as much predictive power as ADP. In general, I predicted that players whose early performance differed substantially from their ADP would tend to move toward a point between their early performance and their draft position, but no specific prediction was made.
Statistic for regression | Performance before prediction | Performance since prediction | Weeks remaining |
---|---|---|---|
Yards per Carry | Group A had 3% more rushing yards per game | Group B has 32% more rushing yards per game | 1 |
Yard to Touchdown Ratio | Group A averaged 2% more fantasy points per game | Group B averages 31% more fantasy points per game | 2 |
I typically use per-game averages for these predictions rather than group totals because it permits me to compare groups of different sizes, and it protects me from situations where one group or the other suffers a different number of injuries or has a different number of bye weeks. But per-game totals aren't perfect; last week Leonard Fournette was active as an emergency running back. He played a single snap and failed to record a single statistic. Technically that was a game and his group average should reflect the zero yards he gained during it. Because Fournette was in our Group A, this is an advantage to our prediction.
But I already have enough advantages in this game— simply getting to pick the statistic I focus on is advantage enough. That's why I don't actively select my groups. The deck is already stacked in my favor. If Fournette were in Group B, I would count this week as bad luck for me and include it in the average, but since he's in Group A I'll discard it. It's not like we need the help; Group B outrushed Group A for the third straight week and once again averaged a higher yards per carry in the process. To this point, our "low-ypc backs" are averaging 4.21 ypc, while our "high-ypc backs" are getting just 3.74.
As for our yard-to-touchdown prediction, a 2-touchdown game by Adam Thielen rescued Group A, who managed to match Group B in end zone trips per game this week but still trailed by 15 yards per game and was outscored once again as a result.
The Science of Intuition
Obviously, one goal of this column is to convince you that regression to the mean is real, it is powerful, and it is everywhere. To explain what it is and how (and why) it works. Another goal is to give you lists of players who are underperforming and players who are overperforming so you can make informed decisions about what to do with them going forward.
But another goal is to equip you with the tools to spot regression in the wild on your own, to help you develop intuitions about what kinds of performances are sustainable and what kinds of performances are unsustainable. Obviously, I'll highlight certain stats and give you my opinions on them. Yards per carry: bad. Yards per touchdown: sustainable, but only within a narrow range from about 100-200. Interception rate: bad. (Sorry, spoiler alert.)
But as years go on, one fact of life in fantasy football is exposure to new statistics. If you listen to football commentary these days you might hear about things like Air Yards, Completion Percentage over Expectation (or CPOE), or Expected Points Added (or EPA). Some of these stats didn't even exist until a few years ago. Are they good? Are they bad?
The gold standard measure of how much a stat might regress is something called stability testing. By comparing performance in one sample to performance in another, we can determine how similar those performances are, how much of a player's performance carries over from one game to the next, from one season to the next. Something like broken tackles, it turns out, is pretty stable. The backs who break a lot of tackles in one year also tend to break a lot of tackles in the next year.
Something like yards per carry, on the other hand, is not stable at all. I've already run down some of the studies, but just look at the predictions in this column: we had a group of backs with 330 carries— a full season's worth! We had another group of backs with 417 carries— a full season and then some. The first group of backs was outproducing the second group by more than two yards per carry. And yet in the three weeks since the second group is actually gaining more yards per carry than the first. Yards per carry is not the slightest bit stable from one sample to the next, which is why it's such a great candidate for regression.
But running their own stability testing is probably going to be beyond the abilities (or the inclinations) of most fantasy football players. (Additionally, just because a statistic is stable doesn't necessarily mean it's useful. Sack rate is one of the most stable quarterback stats, but it's also useless for fantasy football purposes unless you're in the rare league that penalizes quarterbacks for sacks.)
So when you encounter a brand new stat, what can you do to tell if it's a useful stat or not? I'm a big fan of a concept that I call "the leaderboard test", statisticians call "face validity", and the rest of us call "the smell test". Basically, just from looking at a list, how well does it match our intuitions of what that list should look like?
I like a statistic called Adjusted Net Yards per Attempt, or ANY/A. It's a quarterback's yards per attempt, but it gives a 20-point bonus for touchdowns, a 45-point penalty for interceptions, and includes sacks and yards lost to sacks. Why do I like it? Because I think the face validity is really high. Here are the top 10 quarterbacks since the merger in era-adjusted ANY/A (with a 2000-attempt minimum):
- Steve Young
- Joe Montana
- Roger Staubach
- Peyton Manning
- Dan Marino
- Aaron Rodgers
- Tom Brady
- Dan Fouts
- Drew Brees
- Tony Romo
Maybe that's not a perfect list. Maybe you'd have Tom Brady higher, or Tony Romo lower. But there are seven quarterbacks on the NFL's 100th-anniversary team who played the bulk of their career since the merger, and five of them are on that list, and four of the others (Young, Brees, Rodgers, Fouts) either were or will be first-ballot Hall of Famers. This list has a very high degree of face validity.
Here's the leaderboard for 2020 so far:
- Aaron Rodgers
- Russell Wilson
- Derek Carr
- Jared Goff
- Josh Allen
- Patrick Mahomes II
- Ryan Tannehill
- Justin Herbert
- Dak Prescott
- Teddy Bridgewater
Again, is it perfect? Probably not. But all of your MVP front-runners show up in the Top 6. Dak Prescott was setting NFL records before getting hurt, Justin Herbert's play has the entire league buzzing. The quarterbacks here are typically really good quarterbacks or at least decent quarterbacks who happen to be playing really well.
And because most of the list makes pretty intuitive sense, we should pay extra attention to the surprise entries. Maybe you didn't expect to see Derek Carr so high, but seeing him sitting between Rodgers/Wilson and Allen/Mahomes probably raises your opinion of his play so far this year. Maybe it makes you a bit more impressed with Justin Herbert's hot start or appreciative of the difference Teddy Bridgewater has made on the Panthers.
Let's compare this to another stat. The NFL has been using its player tracking data to create a suite of "Next Gen Stats" to help fans evaluate the game. One stat they created is a measure of the average separation a receiver gets. Here's the Top 10 so far this year:
- Demarcus Robinson
- Jordan Akins
- George Kittle
- Gabriel Davis
- Noah Fant
- JuJu Smith-Schuster
- Tyler Lockett
- Robert Woods
- Keelan Cole
- Drew Sample
Here's the same list from the 2019 season:
- Calvin Ridley
- Robert Woods
- Corey Davis
- Ted Ginn Jr
- Albert Wilson
- Jacob Hollister
- Steven Sims Jr
- Hunter Renfrow
- Chris Godwin
- Alex Erickson
Do these lists have face validity? Do they pass the smell test? Not really. There are some really good players here. There are some really bad players here. There are some one-dimensional deep threats, but there are also short-area separators and yards-after-the-catch specialists. I can probably invent a story to tie all of these guys together. Maybe the good players are here because they're really good at getting wide open. And maybe the bad players are here because the quarterback only looks their way when they're wide open. Maybe.
If you see that a quarterback is having a great season as measured by ANY/A, that should serve as compelling evidence to you that the quarterback is playing really well and you should be predisposed to believe that he'll be able to sustain his production to some extent or another. If you see a receiver is having a great season as measured by average separation, that... shouldn't really move the needle for you at all. That's not really evidence that the receiver is any good or that his level of play is in any way sustainable.
Intuitions are fallible, and at the end of the day, they're not as good as rigorous statistical analysis. But rigorous statistical analysis is hard and boring and a lot of work and most of us have better things to do. There's no need to let the perfect be the enemy of the perfectly fine. Raw intuition is an underrated tool for separating the wheat from the chaff and, in a world with an ever-increasing number of new statistics to navigate, quickly settling on what we care about and what's just noise.