Welcome to Regression Alert, your weekly guide to using regression to predict the future with uncanny accuracy.
For those who are new to the feature, here's the deal: every week, I break down a topic related to regression to the mean. Some weeks, I'll explain what it is, how it works, why you hear so much about it, and how you can harness its power for yourself. In other weeks, I'll give practical examples of regression at work.
In weeks where I'm giving practical examples, I will select a metric to focus on. I'll rank all players in the league according to that metric and separate the top players into Group A and the bottom players into Group B. I will verify that the players in Group A have outscored the players in Group B to that point in the season. And then I will predict that, by the magic of regression, Group B will outscore Group A going forward.
Crucially, I don't get to pick my samples (other than choosing which metric to focus on). If I'm looking at receivers and Justin Jefferson is one of the top performers in my sample, then Justin Jefferson goes into Group A, and may the fantasy gods show mercy on my predictions.
And then because predictions are meaningless without accountability, I track and report my results. Here's last year's season-ending recap, which covered the outcome of every prediction made in our seven-year history, giving our top-line record (41-13, a 76% hit rate) and lessons learned along the way.
Our Year to Date
Sometimes, I use this column to explain the concept of regression to the mean. In Week 2, I discussed what it is and what this column's primary goals would be. In Week 3, I explained how we could use regression to predict changes in future performance-- who would improve, who would decline-- without knowing anything about the players themselves.
Sometimes, I point out broad trends. In Week 5, I shared twelve years worth of data demonstrating that preseason ADP held as much predictive power as performance to date through the first four weeks of the season.
Other times, I use this column to make specific predictions. In Week 4, I explained that touchdowns tend to follow yards and predicted that the players with the highest yard-to-touchdown ratios would begin outscoring the players with the lowest.
The Scorecard
Statistic Being Tracked | Performance Before Prediction | Performance Since Prediction | Weeks Remaining |
---|---|---|---|
Yard-to-TD Ratio | Group A averaged 17% more PPG | Group B averages 27% more PPG | 2 |
I said last week's 57% edge wasn't going to hold, and it hasn't; our "high-touchdown" receivers reached the end zone a bit more than our "low-touchdown" receivers last week (though between byes and injuries, 25% of our Group A and 40% of our Group B receivers didn't play a snap).
Group B's huge advantage in Week 4 still more than offsets Group A's smaller advantage in Week 5; Group B leads Group A in yards per game and touchdowns per game. It is also averaging fewer yards per touchdown, though both are well within the "sustainable band" at the moment.
What Use is Yards per Carry?
Yards per carry is one of the most beloved statistics for judging running backs. Jamaal Charles has never averaged below 5 yards per carry in a season where he's had at least 20 carries*; therefore, Jamaal Charles is a star. Trent Richardson had 1300 yards from scrimmage and 12 touchdowns as a rookie, ranking as a top-10 fantasy back, but his 3.6 yards per carry was an early warning sign that he would eventually be regarded as a colossal bust.
*(Technically, Charles averaged 4.97 yards per carry in 2013, but what's a few hundredths of a yard among friends?)
I've written more about Trent Richardson before, back in 2014, when another young rookie had just had a high-volume, low-YPC season that had everyone drawing parallels and claiming he was destined to disappoint. I wrote that, based on history, maybe we shouldn't be writing off this Le'Veon Bell fellow quite so quickly.
Indeed, the list of high-volume, low-YPC rookie running backs was basically Trent Richardson and a who's who of Hall of Famers or almost Hall of Famers. In addition to Richardson (3.56 ypc) and Bell (3.52 ypc), there's LaDainian Tomlinson (3.65), Ricky Williams (3.49), Walter Peyton (3.46 ypc), Emmitt Smith (3.89 ypc), Matt Forte (3.92 ypc), and Marshawn Lynch (3.98 ypc).
Even the guys on the high-volume, low-YPC list who didn't go on to be All-Pros typically had several quality fantasy years in them. Karim Abdul-Jabbar, Travis Henry, Errict Rhett, and Joe Cribbs all followed up their "inefficient" rookie season with a top-12 fantasy campaign as a sophomore, Sammie Smith improved across the board and finished as RB18, and Jahvid Best looked (and produced) like a star before injuries derailed his career.
Since I wrote that article in 2014, Melvin Gordon III also found himself on the "wrong" side of the ledger with an awful rookie ypc of 3.48. Fearing the shade of Trent Richardson, many owners sold low on the "inefficient" Gordon after a "disappointing" rookie season, only to see him rank 3rd in fantasy points (nearly 20% ahead of fourth place) from 2016-2018.
Indeed, other than Richardson himself, the only running back who had a high-volume, low "efficiency" rookie season and followed it up with a disappointing sophomore campaign was James Jackson, who also happens to be the only back in the sample to average below 3 yards per carry as a rookie, (2.84), and whose team thought so little of him that they drafted William Green in the first round to replace him.
What is going on here? Why is having a terrible rookie yard-per-carry average such a positive sign for a player's career? The truth is that a poor yard-per-carry average isn't a positive sign. It just isn't a negative one, either. I'm providing a list of high-workload rookies with low yards per carry, and the high-workload part is the real key.
Backs get a high workload because the coaching staff thinks they're good and wants to give them the ball. In the long run, backs who coaching staffs think are good and want to give the ball... tend to be pretty good. The low ypc, in the meantime, is just a meaningless fluke.
What Is Yards per Carry, Anyway?
To understand why yards per carry is a fluke, you have to understand something very important about it: it's not measuring how good a running back is. It's so thoroughly dominated by outlier runs that all it's really measuring is whether a back has had three long runs or merely two. For the majority of players who finish the season above the league average mark in yards per carry, you only have to remove one or two carries to drop them below the league average.
To some extent, long runs are a product of player skill. But they're a product of a very specific skill— straight-line speed. Someone like Le'Veon Bell might excel at every other skill required of the position, but since he lacks high-end straight-line speed, his ypc will always underestimate his value. Indeed, the longest touchdown run of Bell's career is just 38 yards.
To an even larger extent than skill, long runs are a product of luck. First and foremost, you can't run for 50 yards if your team is only 40 yards from the end zone. Additionally, you likely need some combination of good blocking and poor tackling to get into space in the first place so you can put that straight-line speed to good use. And insofar as long runs are dominated by luck, you'd expect them to vary wildly from one sample to the next.
The Smell Test
What does this mean in practice? Statisticians have a concept called "face validity". Most of the rest of us better know it as "the smell test". Let's say I invent a statistic that I claim measures how good running backs are. The first thing I should do is look at a list of running backs under my new statistic and see if my statistic has face validity— see if it passes the smell test.
If I ranked the 127 running backs who have 300 carries over the last decade, and I told you I had Nick Chubb, Aaron Jones, Jonathan Taylor, and Jamaal Charles in my Top 10 and Andre Williams, Matt Asiata, Peyton Barber, and Alfred Blue as my Bottom 4, that would pass the smell test.
But if I told you I also had Rashaad Penny at #1 overall (by a lot), Raheem Mostert third, Khalil Herbert and Gus Edwards 7th and 8th, and Elijah Mitchell, Justin Forsett, Damien Harris, and Matt Breida in my Top 20, things wouldn't be smelling so good. By the time you find out that Frank Gore Jr., Matt Forte, Jamaal Williams, and Najee Harris all rank among the Bottom 20 backs by my metric, things are smelling downright rotten.
And that's even before I told you I had Isiah Pacheco over Derrick Henry, Ryan Matthews and Phillip Lindsay over Alvin Kamara and Austin Ekeler, Chase Edmonds and James Robinson over Marshawn Lynch and Saquon Barkley, Marlon Mack over Lesean McCoy, Thomas Rawls over Josh Jacobs, Wayne Gallman over Le'Veon Bell, Tevin Coleman over Todd Gurley, and on and on and on.
Because that's exactly what you see if you rank running backs by yards per carry. The guys toward the top tend to be better than the guys toward the bottom... but this tendency is fairly tenuous. Perusing the list, it becomes clear the relationship between talent and yards per carry is remarkably weak.
Stabilizing?
Danny Tuccitto has calculated how long it has historically taken various statistics to "stabilize"— to reach a point where they are more representative of player talent than they are of noise, luck, or random chance. For instance, for quarterback yards per attempt (arguably the single best "simple stat" in all of football), it takes about 396 pass attempts before a player's average represents 50% skill, 50% luck. After a little bit less than a full season in an offense, we can be pretty confident which quarterbacks are pretty good and which are not based on yards per attempt alone.
For yards per carry to stabilize, a back would need about 1978 carries (in Danny's words, "a vomit-inducing" 1978 carries). For context, that's more carries than Maurice Jones-Drew or DeAngelo Williams had in their entire career. Derrick Henry and Ezekiel Elliott are the only active players to surpass that total, and both just barely clear it. No other active back is within 300 carries of the mark.
Essentially, the practical answer to the question of when yard-per-carry stabilizes is "never". A back's yard-per-carry average is always more luck than skill.
What does it mean to say that yards per carry is always more luck than skill? Well, for one thing, the correlation between yards per carry in one year and the next is extremely low. Not only that, the correlation between yards per carry between one 8-game sample and another 8-game sample in the same season is extremely low.
If a running back averages 5.00 yards per carry in one 8-game sample, based on regression, we'd expect him to average 4.37 in the other. If a running back averages 3.50 yards per carry in one 8-game sample, we'd expect him to average 3.93 in the other. Thanks to the magic of regression to the mean, a chasmic 1.5-yard-per-carry difference shrunk to a barely noticeable 0.44-yard-per-carry difference.
Yards per carry is the quintessential regression stat, the easiest win to add to the column. It doesn't really measure how good a player is, it's always more a product of luck than skill, and it fluctuates wildly and randomly between samples.
Volume, on the other hand, is incredibly sticky. Backs who get a lot of touches with a low yard-per-carry average are likely, going forward, to get a lot of touches with a higher yard-per-carry average. On the other hand, backs who get a few touches with a high average are likely, going forward, to get a few touches with a lower average.
A Guaranteed Win?
For a while, I called this prediction our one "guaranteed win" of the year. We opened 9-0 betting against it over the years with a median swing from Group A to Group B of 39%. But nothing in football is guaranteed, and last year handed us our first two losses.
The first came from hubris; I selected a Group A with a 41% lead over Group B in rushing yards per game, the second largest we'd yet tried. (Recall that the median swing on this prediction at the time was just 39%.) Group B managed a 32% swing, right on par with the typical improvement (in fact, this is now the new median), but it wasn't enough to overcome the huge deficit they were saddled with at the start.
The second loss was just bad luck. Group A improved after the prediction for the first time. It was bound to happen sooner or later by chance alone.
Even counting the losses, if you pool together all predictions since 2018, the "high-YPC" backs had 3366 carries for 18,802 yards when we made our predictions, a 5.59-yard per carry average. Our "low-YPC" backs, by contrast, had 4298 carries for 16,490 yards, 3.84 yards per carry. These are massive samples; the Group B backs collectively had nearly as many carries as Emmitt Smith over his entire career (4409), while the Group A backs out-rushed Adrian Peterson (3230 carries).
In the four weeks following our predictions, Group A backs rushed 3185 times for 14,292 yards, while Group B rushed 4074 times for 18,223 yards. Again, the samples are massive. Group B had 27% more carries before the prediction and 28% more carries after the prediction-- their workload advantage was very stable. Group A, by contrast, had a 46% yard per carry advantage before the prediction, but that fell to a 0.4% advantage after the prediction (4.49 to 4.47). Overall, we've gone from Group A rushing for 14% more yards to Group B rushing for 28% more.
This year has even given us the perfect avatar for the randomness of yards per carry. Over the last four years, D'Andre Swift was 15th in yards per carry with 4.6. Through three weeks this year, he averaged 1.83 yards per carry, the 9th-worst mark by a running back with at least 30 carries since the 1970 merger. Then, in Week 4, he averaged 5.8 yards per carry.
Last year, De'Von Achane had the highest ypc average of any player with 100 carries since Beattie Feathers in 1934-- a whopping 7.8! So far this season he's averaging 3.3. Breece Hall averaged 4.8 ypc over the first two years of his career. Of the 50 backs with at least 30 carries, his current 3.0 ypc mark is the second worst.
The only player with a lower ypc? That would be Gus Edwards, who, through 2022, had the 9th-best yard per carry average in NFL history among players with at least 500 career carries.
It's almost like a player's yard per carry average in small samples is functionally random.
Our Groups
Right now, there are fourteen running backs averaging 5 or more yards per carry on 30 or more carries: Tank Bigsby, J.K. Dobbins, Derrick Henry, Saquon Barkley, Antonio Gibson, Chuba Hubbard, Bucky Irving, Chase Brown, Ken Walker III, Tyler Allgeier, Jahmyr Gibbs, Tyrone Tracy Jr., Jerome Ford, and Jordan Mason. This is our Group A.
On the other end, there are five running backs averaging 4 or fewer yards per carry on 60 or more carries. Those backs are Tony Pollard, Kyren Williams, Najee Harris, D'Andre Swift, and Breece Hall. I wish we had more high-volume, low-ypc qualifiers (predictions work better with larger samples), but we have to work with what we have, so this is our Group B.
Group A is averaging 66.3 yards per game at 5.7 yards per carry. Group B is averaging 54.5 yards per game at 3.5 yards per carry. Group A has 21.7% more yards, but Group B has 35.6% more carries. Because a player's workload is historically much more stable across samples than what he does with that workload, I predict that Group B will outrush Group A over the next four weeks.