Regression Alert: Week 3

Adam Harstad's Regression Alert: Week 3 Adam Harstad Published 09/22/2022

Welcome to Regression Alert, your weekly guide to using regression to predict the future with uncanny accuracy.

For those who are new to the feature, here's the deal: every week, I dive into the topic of regression to the mean. Sometimes I'll explain what it really is, why you hear so much about it, and how you can harness its power for yourself. Sometimes I'll give some practical examples of regression at work.

In weeks where I'm giving practical examples, I will select a metric to focus on. I'll rank all players in the league according to that metric and separate the top players into Group A and the bottom players into Group B. I will verify that the players in Group A have outscored the players in Group B to that point in the season. And then I will predict that, by the magic of regression, Group B will outscore Group A going forward.

Crucially, I don't get to pick my samples (other than choosing which metric to focus on). If I'm looking at receivers and Cooper Kupp is one of the top performers in my sample, then Cooper Kupp goes into Group A and may the fantasy gods show mercy on my predictions.

Most importantly, because predictions mean nothing without accountability, I track the results of my predictions over the course of the season and highlight when they prove correct and also when they prove incorrect. At the end of last season, I provided a recap of the first half-decade of Regression Alert's predictions. The executive summary is we have a 32-7 lifetime record, which is an 82% success rate.

If you want even more details here's a list of my predictions from 2020 and their final results. Here's the same list from 2019 and their final results, here's the list from 2018, and here's the list from 2017.


An Easy Win to Start Us Off

If you read last week's column, you know that one of the keys to profiting off of regression to the mean is recognizing that everything regresses, but not everything regresses at the same rate. The more a statistic is dominated by luck, the more that statistic is going to swing wildly from one sample to the next.

Because I'm going to be making predictions and tracking their accuracy, I want to start the season off with my best prediction, the one I'm most confident in. And to make that prediction, I want to focus on the statistic that is more dominated by luck and random chance than any other statistic I know. I want to focus on yards per carry (ypc).

Yards per carry is one of the most beloved statistics for judging running backs. Jamaal Charles has never averaged below 5 yards per carry in a season where he's had at least 20 carries*, therefore Jamaal Charles is a star. Trent Richardson had 1300 yards from scrimmage and 12 touchdowns as a rookie, ranking as a top-10 fantasy back, but his 3.6 yard per carry was an early warning sign that he would eventually be regarded as a colossal bust.

*(Technically, Charles averaged 4.97 yards per carry in 2013, but what's a few hundredths of a yard among friends?)

I've written more about Trent Richardson before, back in 2014 when another young rookie had just had a high-volume, low-ypc season that had everyone drawing parallels and claiming he was destined to disappoint. I wrote that, based on history, maybe we shouldn't be writing off this Le'Veon Bell fellow quite so quickly.

Indeed, the list of high-volume, low-ypc rookie running backs was basically Trent Richardson and a who's who of Hall of Famers or almost Hall of Famers. In addition to Richardson (3.56 ypc) and Bell (3.52 ypc), there's LaDainian Tomlinson (3.65), Ricky Williams (3.49), Walter Peyton (3.46 ypc), Emmitt Smith (3.89 ypc), Matt Forte (3.92 ypc), and Marshawn Lynch (3.98 ypc).

Even the guys on the high-volume, low-ypc list who didn't go on to be All-Pros typically had several quality fantasy years in them. Karim Abdul-Jabbar, Travis Henry, Errict Rhett, and Joe Cribbs all followed up their “inefficient” rookie season with a top-12 fantasy campaign as a sophomore, Sammie Smith improved across the board and finished as RB18, and Jahvid Best looked (and produced) like a star before injuries derailed his career.

Since I wrote that article in 2014, Melvin Gordon III has also found himself on the “wrong” side of the ledger with an awful rookie ypc of 3.48. Fearing the shade of Trent Richardson, many owners sold low on the “inefficient” Gordon after a “disappointing” rookie season, only to see him rank 3rd in fantasy points (nearly 20% ahead of fourth place) from 2016-2018.

Indeed, other than Richardson himself, the only running back who had a high-volume, low “efficiency” rookie season and followed it up with a disappointing sophomore campaign was James Jackson, who also happens to be the only back in the sample to average below 3 yard per carry as a rookie, (2.84), and whose team thought so little of him that they drafted William Green in the first round to replace him.

What is going on here? Why is having a terrible rookie yard per carry average such a positive sign for a player's career? The truth is that a poor yard per carry average isn't a positive sign. It just isn't a negative one, either. I'm providing a list of high-workload rookies with low yards per carry, and the high-workload part is the real key.

Backs get a high workload because the coaching staff thinks they're good and wants to give them the ball. In the long run, backs who coaching staffs think are good and want to give the ball... tend to be pretty good. The low ypc, in the meantime, is just a meaningless fluke.

What Is Yards per Carry, Anyway?

To understand why yards per carry is a fluke, you have to understand something very important about it: it's not measuring how good a running back is. It's so thoroughly dominated by outlier runs that all it's really measuring is whether a back has had three long runs or merely two. For the majority of players who finish the season above the league average mark in yards per carry, you only have to remove one or two carries to drop them below the league average.

Already a subscriber?

Continue reading this content with a PRO subscription.

To some extent, long runs are a product of player skill. But they're a product of a very specific skill— straight-line speed. Someone like Le'Veon Bell might excel at every other skill required of the position, but since he lacks high-end straight-line speed, his ypc will always underestimate his value. Indeed, the longest touchdown run of Bell's career is just 38 yards.

To an even larger extent than skill, long runs are a product of luck. First and foremost, you can't run for 50 yards if your team is only 40 yards from the end zone. Additionally, you likely need some combination of good blocking and poor tackling to get into space in the first place so you can put that straight-line speed to good use. And insofar as long runs are dominated by luck, you'd expect them to vary wildly from one sample to the next.

What does this mean in practice? Statisticians have a concept called "face validity". Most of the rest of us better know it as "the smell test". Let's say I invent a statistic that I claim measures how good running backs are. The first thing I should do is look at a list of running backs under my new statistic and see if my statistic has face validity— see if it passes the smell test.

If I ranked the 128 running backs who have 300 carries over the last decade, and I told you I had Nick Chubb, Jonathan Taylor, and Aaron Jones in my Top 5 and Matt Asiata, Andre Williams, and Trent Richardson in my Bottom 5, that would pass the smell test. But if I told you I also had Raheem Mostert at #1 overall and Gus Edwards, Miles Sanders, Tony Pollard, and Matt Breida in my Top 10, while Melvin Gordon, Todd Gurley, LeVeon Bell, Joe Mixon, Leonard Fournette, and Frank Gore were all in the bottom 50%— that Phillip Lindsay ranks above Marshawn Lynch and Jay Ajayi ranks above LeSean McCoy and Damien Williams ranks above Matt Forte and Tevin Coleman ranks higher than Arian Foster... suddenly that doesn't pass the smell test anymore. Yet that's exactly what you see if you rank running backs by yards per carry. The correlation between talent and yards per carry is remarkably weak.

In fact, Danny Tuccitto has calculated how long it has historically taken various statistics to “stabilize”— to reach a point where they are more representative of player talent than they are of noise, luck, or random chance. For instance, for Yards per Attempt (arguably the single best "simple stat" in all of football), it takes about 396 pass attempts before a player's average represents 50% skill, 50% luck. After a little bit less than a full season in an offense in we can be pretty confident which quarterbacks are pretty good and which are not based on yards per attempt alone.

For yards per carry to stabilize, a back would need about 1978 carries, (in Danny's words, "a vomit-inducing" 1978 carries). For context, that's more carries than Maurice Jones-Drew or DeAngelo Williams had in their entire career. No running back in the league today has that many carries in his entire career. Essentially, the practical answer to the question of when yard per carry stabilizes is “never”. A back's yard per carry is always more luck than skill.

What does it mean to say that yard per carry is always more luck than skill? Well, for one thing, the correlation between yard per carry in one year and the next is extremely low. Not only that, the correlation between yard per carry between one 8-game sample and another 8-game sample in the same season is extremely low.

If a running back averages 5.00 yard per carry in one 8-game sample, based on regression we'd expect him to average 4.37 in the other. If a running back averages 3.50 yard per carry in one 8-game sample, we'd expect him to average 3.93 in the other. Thanks to the magic of regression to the mean, a chasmic 1.5 yard per carry difference shrunk to a barely noticeable 0.44 yard per carry difference.

This is why I always lead this column off with yard per carry. It is the quintessential regression stat, the easiest win to add to the column. It doesn't really measure how good a player is, it's always more a product of luck than skill, and it fluctuates wildly and randomly between samples.

Volume, on the other hand, is incredibly sticky. Backs who get a lot of touches with a low yard per carry average are likely, going forward, to get a lot of touches with a higher yard per carry average. On the other hand, backs who get a few touches with a high average are likely, going forward, to get a few touches with a lower average.

Regression is not about being 100% right 100% of the time; if you hit on 70% of your bets in fantasy football, you'll dominate your league. A day will come when I make the yard per carry prediction and it fails. But I'm happy to bet that today is not that day.

Right now there are eight running backs who have totaled 20-35 carries with an average of 5 or more yards per carry. These "high-ypc" backs are DAndre Swift, Aaron Jones, Clyde Edwards-Helaire, Miles Sanders, Christian McCaffrey, Javonte Williams, Cordarrelle Patterson, and Damien Harris. Collectively, they have combined for 188 carries for 1206 yards, an average of 6.4 yards per carry and 75.4 yards per game. This is our Group A.

On the other end, there are ten running backs who have totaled 25 or more carries with an average of 4.3 or fewer yards per carry. Those backs are Josh Jacobs, Leonard Fournette, Ezekiel Elliott, Dalvin Cook, Jeff Wilson, Dameon Pierce, James Robinson, A.J. Dillon, Derrick Henry, and Joe Mixon. Collectively, they have 320 carries for 1220 yards, an average of just 3.8 yards per carry and 61 yards per game. This is our Group B.

Through two weeks, Group A is outrushing Group B by 23.6% despite Group B getting 36% more carries per game. I'm betting that both groups are going to see their yards per carry regress much closer to league average, at which point Group B's volume advantage will prove decisive. As a result, I predict that backs from Group B will average more rushing yards per game over the next four weeks than backs from Group A. As always, I will track this prediction every week and report back in a month with the final results

One final note: in my five years writing this column, I have made a variation on this prediction eight times, and it has hit eight times. But that's not the really interesting bit here: in seven of those eight four-week windows the "low-ypc" backs from Group B have averaged more yards per carry than the "high-ypc" backs from Group A! In large part, this is a statistical fluke; even if yards per carry were literally random, we'd expect Group A to average more than Group B 50% of the time, and even I-- the biggest ypc-hater on the block-- don't think it's literally random. (At my absolute most pugnacious, I'll call it a pseudorandom number generator.)

Despite our crazy track record here, I don't think Group B is going to average more yards per carry than Group A over the next month (although it won't be very surprising if they do). But I do think the two averages are going to be dramatically closer than anyone expects, and Group B's volume advantage will prove decisive. Tune in during the coming weeks to see if I'm right.

Photos provided by Imagn Images

More by Adam Harstad

 

Dynasty, in Theory: Do the Playoffs Matter?

Adam Harstad

Should we include playoff performances when evaluating players?

01/18/25 Read More
 

Odds and Ends: Divisional Round

Adam Harstad

Examining past trends to predict the future.

01/17/25 Read More
 

Odds and Ends: Wild Card Weekend

Adam Harstad

Examining the playoff futures and correctly predicting the Super Bowl winner.

01/10/25 Read More
 

Dynasty, in Theory: Evaluating Rookie Receivers

Adam Harstad

Revisiting this year's rookies through the lens of the model

01/09/25 Read More
 

Dynasty, in Theory: Consistency is a Myth

Adam Harstad

Some believe consistency helps you win. (It doesn't.)

01/04/25 Read More
 

Odds and Ends: Week 18

Adam Harstad

How did we do for the year? Surprisingly well!

01/02/25 Read More