Welcome to Regression Alert, your weekly guide to using regression to predict the future with uncanny accuracy.
For those who are new to the feature, here's the deal: every week, I dive into the topic of regression to the mean. Sometimes I'll explain what it really is, why you hear so much about it, and how you can harness its power for yourself. Sometimes I'll give some practical examples of regression at work.
In weeks where I'm giving practical examples, I will select a metric to focus on. I'll rank all players in the league according to that metric and separate the top players into Group A and the bottom players into Group B. I will verify that the players in Group A have outscored the players in Group B to that point in the season. And then I will predict that, by the magic of regression, Group B will outscore Group A going forward.
Crucially, I don't get to pick my samples (other than choosing which metric to focus on). If I'm looking at receivers and Cooper Kupp is one of the top performers in my sample, then Cooper Kupp goes into Group A and may the fantasy gods show mercy on my predictions.
Most importantly, because predictions mean nothing without accountability, I track the results of my predictions over the course of the season and highlight when they prove correct and also when they prove incorrect. At the end of last season, I provided a recap of the first half-decade of Regression Alert's predictions. The executive summary is we have a 32-7 lifetime record, which is an 82% success rate.
If you want even more details here's a list of my predictions from 2020 and their final results. Here's the same list from 2019 and their final results, here's the list from 2018, and here's the list from 2017.
The Scorecard
In Week 2, I broke down what regression to the mean really is, what causes it, how we can benefit from it, and what the guiding philosophy of this column would be. No specific prediction was made.
In Week 3, I dove into the reasons why yards per carry is almost entirely noise, shared some research to that effect, and predicted that the sample of backs with lots of carries but a poor per-carry average would outrush the sample with fewer carries but more yards per carry.
STATISTIC FOR REGRESSION | PERFORMANCE BEFORE PREDICTION | PERFORMANCE SINCE PREDICTION | WEEKS REMAINING |
---|---|---|---|
Yards per Carry | Group A had 10% more rushing yards per game | Group B has 16% more rushing yards per game | 3 |
When I made last week's prediction, our "high-YPC" group was averaging 6.41 yards per carry and our "low-YPC" group was averaging 3.81 yards per carry. As a point of comparison, league average among RBs is 4.38. In our first week, the "high-YPC" running backs averaged 4.23 yards per carry and the "low-YPC" running backs averaged 4.44.
Is this the result of a lone outlier performance? Quite the opposite-- the "high-YPC" group is the one with a lone outlier dragging their average up. Cordarrelle Patterson had 17 rushes for 141 yards for the high-ypc cohort, an average of 8.29 yards per carry; no one else in the group topped 4.5. Meanwhile, half of the "low-ypc" backs topped 4.5, with a median value for the group of 4.56 compared to 3.80 for Group A.
Given that Group B backs were higher-volume to begin with, the moment Group A's yard per carry advantage disappeared it took their rushing yardage advantage with it. Group B has outgained Group A by 16% through one week, though there's a lot of time left in the prediction.
PLAYING THE HITS
If you go see Lynyrd Skynyrd live, you know they're playing Sweet Home Alabama and Freebird. The Stones are going to play (I Can't Get No) Satisfaction. KISS is going to play Rock and Roll All Nite and Detroit Rock City, and of course, Ozzy is eventually going to get around to Crazy Train.
Similarly, Regression Alert loves delving into the back catalog for obscure stats and deep cuts from time to time, but we know where our bread is buttered and we aren't shy about serving up the hits, either. Last week we played our old classic "Yards Per Carry is Pseudoscience". This week we have our seminal work "Touchdowns Follow Yards (But Yards Don't Follow Back)". Next week we're going to really drive the crowd nuts with our smash "Revisiting Preseason Expectations". But that's getting ahead of ourselves.
First, let's talk about touchdowns. Actually, before we talk about touchdowns, let's talk about vocabulary.
sto·chas·tic
adjective
randomly determined; having a random probability distribution or pattern that may be analyzed statistically but may not be predicted precisely.
Touchdowns are stochastic. Over his career, Cam Newton rushed for 70 touchdowns in 140 games, an average of 0.5 touchdowns per game. We could say that's his "true production level", and over a sufficiently long timeline, we'd probably expect him to conform to that, averaging 0.5 touchdowns per game.
Despite that being his true production level, though, guess how many times Cam Newton rushed for half a touchdown in a game? As far as I can tell (and I have researched this topic extensively), it has never happened. Instead, he either scores zero touchdowns... or he scores one touchdown. (Sometimes he scores two touchdowns, and once he even rushed for three touchdowns.) Because they are binary outcomes, we can analyze Cam Newton's rushing touchdowns statistically, but we cannot predict them precisely.
Yards don't really behave like that. Over his career, Cam Newton averaged 38.6 rushing yards per game. But it's not like every week he's either getting you 0 yards or else he's getting you 75 yards. Instead, more games than not, he's getting you somewhere between 20 and 60 yards. His yardage total is much more consistent from game to game than his touchdown total.
One way to measure consistency is something called standard deviation, which measures how much something varies around the average. The standard deviation of Newton's rushing yardage is 24.5 yards. The standard deviation of Newton's rushing touchdowns is 0.65 touchdowns.
Now, these numbers are not directly comparable. Standard deviations for large values are naturally bigger than standard deviations for small values. (Consider: if you switched to "feet rushing per game" rather than "yards rushing per game", the standard deviation would triple despite the underlying game-to-game variation remaining unchanged.)
But if you divide a player's standard deviation by that player's average, you get something called the coefficient of variation, or CV. CV is a way to compare how volatile different statistics are. The CV of Newton's yards is 64%, meaning it tends to vary by about 64% of his overall average. The CV of Newton's touchdowns is 130%. Touchdowns are much more random from week to week than yards are— in Newton's case, about twice as random according to CV. (For those curious, the CV of Newton's rush attempts was 42%; "usage" stats like attempts tend to be more stable from week to week even than yards.)
Not only are they more unstable, but touchdowns are also much more valuable than yards. In most scoring systems, one extra touchdown is worth the equivalent of 60 extra yards. Which means if Newton caught the high side of variance and scored a few extra touchdowns early in the year, it could dramatically inflate his fantasy production to date. And if he caught the low side of variance and failed to reach the end zone, it could leave him far lower than we'd otherwise expect.
Which gives rise to my favorite statistic for regression: yard-to-touchdown ratios. Some players are really, really good at getting yards and/or not quite as good at scoring touchdowns. For years, Julio Jones has been the most famous example of this; he has gained 220 receiving yards in his career for every touchdown he has scored. This is a very high average, but there are other wide receivers in this general range; Andre Johnson averaged 203 yards for every touchdown, Henry Ellard averaged 212, etc.
Other players are really, really good at getting touchdowns but typically aren't commensurately good at getting yards. For his career, Davante Adams scores a touchdown for every 109 yards he gains receiving. Again, this is a very low average, but not historically implausible; Dez Bryant averaged 102 yards for every touchdown, while Randy Moss was all the way down at 98 yards per touchdown.
Importantly: the yard-to-touchdown ratio is not a measure of player quality. Over 2016 and 2017, Davante Adams averaged 940 yards and 11 touchdowns. Last year, Davante Adams had 1553 yards and 11 touchdowns. It should go without saying that Adams played much, much better in 2011 than he did in 2016 and 2017 despite averaging a "worse" yard-to-touchdown ratio. All else being equal, a guy who gains 1500 yards and 10 touchdowns is better than a guy who gains 1000 yards and 10 touchdowns.
If you asked who was the best receiver in the NFL at various points over the last five years, you might plausibly have heard Jones (216 yards per touchdown), Michael Thomas (186 yards per touchdown), DeAndre Hopkins (156 yards per touchdown), Stefon Diggs (150 yards per touchdown), Antonio Brown (148 yards per touchdown), Odell Beckham (132 yards per touchdown), Tyreek Hill (120 yards per touchdown), or Adams (109 yards per touchdown). (Similarly, I could easily find mediocre or even bad receivers who span the whole yard-to-touchdown spectrum; Devin Funchess averages 108 yards per touchdown, but he's no Davante Adams.)
With that in mind, over the long term, receivers tend to average between 100 and 200 yards per touchdown, with the majority of the league clustered between 120 and 180. Any rate that falls in that range is plausibly sustainable and perhaps a true representation of a player's relative skill at scoring touchdowns. But because touchdowns are stochastic, in the short run, we see yard-to-touchdown ratios that are wildly outside of that "sustainable" zone. And because touchdowns count for so many points in fantasy football, this gives us a ton of targets for regression.
So let's pit the receivers with a lot of yards but very few touchdowns against the receivers with a lot of touchdowns but very few yards and see what happens. There are ten receivers in the NFL right now who have 200 or fewer yards and 2 or more touchdowns (guaranteeing a yard-to-touchdown ratio of 100 or lower). Similarly, there are nine receivers in the NFL right now who have 200 or more yards and 1 or fewer touchdowns (resulting in a yard-to-touchdown ratio of 200 or higher). Here's the full list:
NAME | RECYD | RECTD | FANT PT | Yds/TD |
---|---|---|---|---|
River Cracraft | 13 | 2 | 13.3 | 7 |
Allen Lazard | 58 | 2 | 17.8 | 29 |
Jahan Dotson | 109 | 3 | 27.9 | 36 |
Devin Duvernay | 121 | 3 | 29.9 | 40 |
Michael Thomas | 171 | 3 | 35.1 | 57 |
Davante Adams | 189 | 3 | 36.5 | 63 |
Isaiah McKenzie | 132 | 2 | 25.8 | 66 |
Mike Williams | 138 | 2 | 25.8 | 69 |
Tyler Boyd | 155 | 2 | 27.5 | 78 |
Curtis Samuel | 181 | 2 | 35.2 | 91 |
Noah Brown | 213 | 1 | 27.3 | 213 |
Terry McLaurin | 235 | 1 | 30.2 | 235 |
Mack Hollins | 240 | 1 | 30.2 | 240 |
DeVonta Smith | 249 | 1 | 30.9 | 249 |
Marquise Brown | 251 | 1 | 31.1 | 251 |
A.J. Brown | 309 | 1 | 36.9 | 309 |
Tyler Lockett | 211 | 0 | 21.1 | Undefined |
Chris Olave | 268 | 0 | 26.8 | Undefined |
Courtland Sutton | 291 | 0 | 29.6 | Undefined |
River Cracraft has 2 receptions for 13 yards and 2 touchdowns; let's toss him out of the sample. (I say that I don't get to pick my groups, but the goal here is to make the regression impressive, so I have no qualms about putting my thumb on the scale to make Group A better so it stands out more when Group B beats them anyway.)
That leaves us with nine members of Group A: Allen Lazard, Jahan Dotson, Devin Duvernay, Michael Thomas, Davante Adams, Isaiah McKenzie, Mike Williams, Tyler Boyd, and Curtis Samuel. These nine receivers are averaging 48.2 yards and 0.85 touchdowns per game, good for 10.1 fantasy points.
Similarly, we have nine members of Group B: Noah Brown, Terry McLaurin, Mack Hollins, DeVonta Smith, Marquise Brown, A.J. Brown, Tyler Lockett, Chris Olave, and Courtland Sutton. These nine receivers are averaging 84.0 yards but just 0.22 touchdowns per game, good for 9.8 fantasy points.
This is normally where I say "Group A currently leads Group B, but by the magic of regression, Group B will lead Group A over the next four weeks", but... Group A's edge in fantasy points per game right now is just 3%. If Group B finishes 2% ahead of them, that's not a very impressive reversal. If they averaged just 3 or 4 more yards per game, they'd be ahead of Group A already. So instead I'll say Group B must outscore Group A by at least 1 point per game for me to count this as a victory. If Group A continues averaging 10 points per game, that means Group B must outscore them by at least 10%, but since I suspect both groups will score slightly fewer points, that means Group B's edge will likely need to be even larger than 10%.
Can Group B do it? I feel fairly confident, but follow along over the coming weeks and we'll find out.