Welcome to Regression Alert, your weekly guide to using regression to predict the future with uncanny accuracy.
For those who are new to the feature, here's the deal: every week, I dive into the topic of regression to the mean. Sometimes I'll explain what it really is, why you hear so much about it, and how you can harness its power for yourself. Sometimes I'll give some practical examples of regression at work.
In weeks where I'm giving practical examples, I will select a metric to focus on. I'll rank all players in the league according to that metric, and separate the top players into Group A and the bottom players into Group B. I will verify that the players in Group A have outscored the players in Group B to that point in the season. And then I will predict that, by the magic of regression, Group B will outscore Group A going forward.
Crucially, I don't get to pick my samples (other than choosing which metric to focus on). If the metric I'm focusing on is touchdown rate, and Christian McCaffrey is one of the high outliers in touchdown rate, then Christian McCaffrey goes into Group A and may the fantasy gods show mercy on my predictions.
Most importantly, because predictions mean nothing without accountability, I track the results of my predictions over the course of the season and highlight when they prove correct and also when they prove incorrect. Here's a list of my predictions from 2019 and their final results, here's the list from 2018, and here's the list from 2017.
The Scorecard
In Week 2, I opened with a primer on what regression to the mean was, how it worked, and how we would use it to our advantage. No specific prediction was made.
In Week 3, I dove into the reasons why yards per carry is almost entirely noise, shared some research to that effect, and predicted that the sample of backs with lots of carries but a poor per-carry average would outrush the sample with fewer carries but more yards per carry.
Statistic for regression | Performance before prediction | Performance since prediction | Weeks remaining |
---|---|---|---|
Yards per Carry | Group A had 3% more rushing yards per game | Group B has 23% more rushing yards per game | 3 |
I don't want to make too much about a one-week sample. (Not out of any principled stand or anything, I've just seen plenty of one-week samples flip wildly come weeks two, three, and four and I'd hate to look foolish for celebrating too early.) But the mean yards per carry for our "high-ypc" Group A backs was 3.75 and the median was 3.21. The mean yards per carry for our "low-ypc" Group B backs was 4.58 and the median was 4.06. Of course, the low ypc of our high-ypc backs will regress over the next three weeks, too, because that's just what ypc does. It's basically just random noise.
PLAYING THE HITS
If you go see Lynyrd Skynyrd live, you know they're playing Sweet Home Alabama and Freebird. The Stones are going to play (I Can't Get No) Satisfaction. KISS is going to play Rock and Roll All Nite and Detroit Rock City, and of course, Ozzy is eventually going to get around to Crazy Train.
Similarly, Regression Alert loves delving into the back catalog for obscure stats and deep cuts from time to time, but we know where our bread is buttered and we aren't shy about serving up the hits, either. Last week we played our old classic "Yards Per Carry is Pseudoscience". This week we have our seminal work "Touchdowns Follow Yards (But Yards Don't Follow Back)". Next week we're going to really drive the crowd nuts with our smash "Revisiting Preseason Expectations". But that's getting ahead of ourselves.
First, let's talk about touchdowns. Actually, before we talk about touchdowns, let's talk about vocabulary.
sto·chas·tic
adjective
randomly determined; having a random probability distribution or pattern that may be analyzed statistically but may not be predicted precisely.
Touchdowns are stochastic. Over his career, Michael Thomas has scored 32 touchdowns in 64 games, an average of exactly 0.5 touchdowns per game. We could say that's his "true production level", and over a long timeline, we'd probably expect him to conform to that, averaging 0.5 touchdowns per game going forward.
Despite that being his true production level, though, guess how many times Michael Thomas has scored half a touchdown in a game? As far as I can tell (and I have researched this topic extensively), it has never happened. Instead, he either scores zero touchdowns... or he scores one touchdown. (Sometimes he even scores two touchdowns.) Because they are binary outcomes, we can analyze Michael Thomas' touchdowns statistically, but we cannot predict them precisely.
Yards don't really behave like that. Over his career, Michael Thomas averages 86.4 yards per game. But it's not like every week he's either getting you 0 yards or else he's getting you 180 yards. Instead, he's usually getting you somewhere between 60 and 120 yards. His yardage total is much more consistent from game to game than his touchdown total.
One way to measure consistency is something called standard deviation, which measures how much something varies around the average. The standard deviation of Thomas' receiving yards is 40.3 yards. The standard deviation of Thomas' receiving touchdowns is 0.64 touchdowns.
Now, these numbers are not directly comparable. But if you divide a player's standard deviation by that player's average, you get something called the coefficient of variation, or CV. CV is a way to compare how volatile different statistics are. The CV of Thomas' yards is 47%, meaning it tends to vary by about 47% of his overall average. The CV of Jones' touchdowns is 128%. Touchdowns are much more random from week to week than yards are— about 2.7 times as random, according to CV. (For those curious, the CV of Thomas' receptions was 40%, while the CV of his targets was just 34%; "usage" statistics are generally much more stable from week to week even than yards.)
Not only are they more unstable, but touchdowns are also much more valuable than yards. In most scoring systems, one extra touchdown is worth the equivalent of 60 extra yards. Which means if Thomas catches the high side of variance and scores a few extra touchdowns early in the year, it can dramatically inflate his fantasy production to date. And if he catches the low side of variance and fails to reach the end zone, it can leave him far lower than we'd otherwise expect.
Which gives rise to my favorite statistic for regression: yard-to-touchdown ratios. Some players are really, really good at getting yards and/or not quite as good at scoring touchdowns. For years, Julio Jones has been the most famous example of this; he has gained 216 receiving yards in his career for every touchdown he has scored. This is a very high average, but there are other wide receivers in this general range; Andre Johnson averaged 203 yards for every touchdown, Henry Ellard averaged 212, etc.
Other players are really, really good at getting touchdowns but typically aren't commensurately good at getting yards. For his career, Davante Adams scores a touchdown for every 117 yards he gains receiving. Again, this is a very low average, but not historically implausible; Dez Bryant averaged 102 yards for every touchdown, while Randy Moss was all the way down at 98 yards per touchdown.
Importantly: the yard-to-touchdown ratio is not a measure of player quality. Davante Adams has twice scored 10 or more touchdowns with 1,000 or fewer yards. All else being equal, a guy who gains 1500 yards and 10 touchdowns is better than a guy who gains 1000 yards and 10 touchdowns, even if the latter guy has a "better" yard-to-touchdown ratio. If you asked who the best receiver over the last few years was, you might plausibly hear Jones (216 yards per touchdown), Thomas (173 yards per touchdown), DeAndre Hopkins (163 yards per touchdown), Antonio Brown (150 yards per touchdown), Odell Beckham (136 yards per touchdown), or Adams (117 yards per touchdown). (Similarly, I could easily find mediocre or even bad receivers who span the whole yard-to-touchdown spectrum.)
With that in mind, over the long term, receivers tend to average between 100 and 200 yards per touchdown, with the majority of the league clustered between 120 and 180. Any rate that falls in that range is plausibly sustainable and perhaps a true representation of a player's relative skill at scoring touchdowns. But because touchdowns are stochastic, in the short run we see yard-to-touchdown ratios that are wildly outside of that "sustainable" zone. And because touchdowns count for so many points in fantasy football, this gives us a ton of targets for regression.
So let's pit the receivers with a lot of yards but very few touchdowns against the receivers with a lot of touchdowns but very few yards and see what happens. There are fourteen receivers in the NFL right now who have 200 or fewer yards and 2 or more touchdowns (guaranteeing a yard-to-touchdown ratio of 100 or lower). Similarly, there are fifteen receivers in the NFL right now who have 201 or more yards and 1 or fewer touchdowns (resulting in a yard-to-touchdown ratio of 200 or higher).
But because the yardage-heavy receivers have been so good, they've actually been outscoring their touchdown-heavy peers despite the dearth of touchdowns. So let's dilute the pool even further; there are five more receivers with 150 or more receiving yards but no touchdowns. So let's add them to our Group B as well. Here are the 34 receivers in question, sorted by their yard-to-touchdown ratio.
Player
|
Receiving Yards
|
Receiving TDs
|
Yard-to-TD Ratio
|
Fantasy Points
|
108
|
4
|
27.0
|
34.8
|
|
75
|
2
|
37.5
|
19.5
|
|
76
|
2
|
38.0
|
19.6
|
|
89
|
2
|
44.5
|
20.9
|
|
160
|
3
|
53.3
|
34.0
|
|
Cedrick Wilson
|
107
|
2
|
53.5
|
22.7
|
170
|
3
|
56.7
|
35.4
|
|
114
|
2
|
57.0
|
22.8
|
|
117
|
2
|
58.5
|
24.6
|
|
123
|
2
|
61.5
|
24.3
|
|
Keelan Cole
|
148
|
2
|
74.0
|
26.8
|
152
|
2
|
76.0
|
27.2
|
|
Darius Slayton
|
188
|
2
|
94.0
|
30.8
|
192
|
2
|
96.0
|
31.2
|
|
206
|
1
|
206.0
|
26.6
|
|
228
|
1
|
228.0
|
30.7
|
|
230
|
1
|
230.0
|
29.0
|
|
Allen Robinson
|
230
|
1
|
230.0
|
28.9
|
245
|
1
|
245.0
|
30.5
|
|
246
|
1
|
246.0
|
30.6
|
|
265
|
1
|
265.0
|
32.5
|
|
269
|
1
|
269.0
|
33.2
|
|
Robby Anderson
|
279
|
1
|
279.0
|
33.9
|
356
|
1
|
356.0
|
41.6
|
|
156
|
0
|
Undefined
|
15.6
|
|
164
|
0
|
Undefined
|
16.4
|
|
Scott Miller
|
167
|
0
|
Undefined
|
17.4
|
173
|
0
|
Undefined
|
17.3
|
|
181
|
0
|
Undefined
|
18.1
|
|
228
|
0
|
Undefined
|
22.8
|
|
230
|
0
|
Undefined
|
24.0
|
|
D.J. Moore
|
239
|
0
|
Undefined
|
23.9
|
259
|
0
|
Undefined
|
28.1
|
|
267
|
0
|
Undefined
|
26.7
|
Mike Evans, Tee Higgins, Dontrelle Inman, Emmanuel Sanders, JuJu Smith-Schuster, Cedrick Wilson, Adam Thielen, Andy Isabella, Anthony Miller, Braxton Berrios, Keelan Cole, John Brown, Darius Slayton, and Davante Adams have all scored more than one touchdown for every 100 receiving yards. Collectively, they average 56.8 yards per touchdown and 9.1 fantasy points per game in standard scoring. This is our Group A.
Corey Davis, Cooper Kupp, Tyler Boyd, Allen Robinson, Justin Jefferson, Michael Gallup, Keenan Allen, Terry McLaurin, Robby Anderson, DeAndre Hopkins, Marquise Brown, Kendrick Bourne, Scott Miller, Jerry Jeudy, Julio Jones, Cole Beasley, CeeDee Lamb, D.J. Moore, Julian Edelman, and Amari Cooper all have fewer than one touchdown for every 200 receiving yards. Collectively, they average 461.8 yards per touchdown and 8.9 fantasy points per game in standard scoring. This is our Group B.
Group A has just a 2% edge in points per game to this point, but Group B should dramatically flip that going forward. I predict at least a 10% edge in fantasy points per game by Group B receivers over the next four weeks. Tune in later as we track the results.