Regression Alert: Week 3

How exactly will regression to the mean help us predict the future?

Adam Harstad's Regression Alert: Week 3 Adam Harstad Published 09/19/2024

Welcome to Regression Alert, your weekly guide to using regression to predict the future with uncanny accuracy.

For those who are new to the feature, here's the deal: every week, I break down a topic related to regression to the mean. Some weeks, I'll explain what it is, how it works, why you hear so much about it, and how you can harness its power for yourself. In other weeks, I'll give practical examples of regression at work.

In weeks where I'm giving practical examples, I will select a metric to focus on. I'll rank all players in the league according to that metric and separate the top players into Group A and the bottom players into Group B. I will verify that the players in Group A have outscored the players in Group B to that point in the season. And then I will predict that, by the magic of regression, Group B will outscore Group A going forward.

Crucially, I don't get to pick my samples (other than choosing which metric to focus on). If I'm looking at receivers and Justin Jefferson is one of the top performers in my sample, then Justin Jefferson goes into Group A, and may the fantasy gods show mercy on my predictions.

And then because predictions are meaningless without accountability, I track and report my results. Here's last year's season-ending recap, which covered the outcome of every prediction made in our seven-year history, giving our top-line record (41-13, a 76% hit rate) and lessons learned along the way.


How to Predict Regression for Fun and Profit

Last week, we discussed what exactly regression to the mean was-- a mathematical tendency when sampling randomly from a dataset for extreme observations to be followed by less extreme observations. We also laid out the four goals of this column: 

  1. to persuade you that regression is real and reliable,
  2. to provide actionable examples to leverage in your fantasy league,
  3. to educate you on how and why regression is working, and 
  4. to equip you with the tools to find usable examples on your own.

This week, we're going to get started on goal #3. I'm a magician with one trick, but unlike any other magician you've met, I'm going to tell you how it works in advance.

We'll begin with a bad example of using regression to make decisions. Let's say that the average fantasy receiver scores 8 points per game. Let's also say that we have ten receivers who have averaged 18, 16, 14, 12, 10, 8, 6, 4, 2, and 0 points to this point.

PlayerScore
Receiver #118 ppg
Receiver #216 ppg
Receiver #314 ppg
Receiver #412 ppg
Receiver #510 ppg
Receiver #68 ppg
Receiver #76 ppg
Receiver #84 ppg
Receiver #92 ppg
Receiver #100 ppg

Assume we know nothing about these receivers, not even their names. (This is the format our typical prediction will take-- when we choose a statistic that's ripe for regression, we'll simply be putting the top performers in one group and the bottom performers in another with no regard to any thoughts about their talent, their situation, their past history of production, or any other relevant factors.)

We would expect these scores to regress in the direction of the mean-- remember, regression is the statistical tendency for more extreme values to be followed by less extreme values. But we shouldn't expect all of these receivers to just average 8 points per game going forward; good receivers are more likely to outperform league average than bad receivers, so the top five receivers on that list are probably better on average than the bottom five.

Let's say we split the difference and expect receivers to score halfway between their production to date and league average. That would produce the following list:

PlayerScore
Receiver #113 ppg
Receiver #212 ppg
Receiver #311 ppg
Receiver #410 ppg
Receiver #59 ppg
Receiver #68 ppg
Receiver #77 ppg
Receiver #86 ppg
Receiver #95 ppg
Receiver #104 ppg

Now, one could say something like, "Receiver #1 is destined to regress! He's averaging 18 points per game to this point, but we only expect him to average 13 points per game going forward! You should sell him now!"

The first two statements are undoubtedly true, but the third doesn't follow. We are merely invoking regression as a talisman rather than a lens for further analysis. Who would we sell him for? We could trade for Receiver #2, but Receiver #2 is likewise destined to regress. As is Receiver #3. As is Receiver #4. As is Receiver #5.

Every player on that list regressed, but the order and the relative size of the gaps remained the same. The best receiver remained the best receiver, and he continued to outscore #3 by exactly as much as #3 outscored #5. Regression will happen, but this exercise doesn't suggest any particular course of action we should take as a result.

For regression to be useful, we need to find examples we can act on. What this means is instead of focusing on the statistic we care about (such as fantasy points per game), we need to focus on statistics that contribute to the one we care about.

Already a subscriber?

Continue reading this content with a 100% FREE Insider account.

By signing up and providing us with your email address, you're agreeing to our Privacy Policy and Terms of Use and to receive emails from Tennessee.

A receiver's fantasy point-per-game total is a function of his receptions, receiving yards, and receiving touchdowns. Last year in PPR leagues, 37.0% of all receiving fantasy points for WRs came from receptions, 46.7% from receiving yards, and 16.4% from receiving touchdowns.

Applying those ratios to our "8-point-per-game" average above, the "perfectly average receiver" should score 3 points from receptions, 3.7 points from yardage, and 1.3 points from touchdowns. But over a small sample, the contributions of each individual factor will vary wildly. Let's generate some profiles for our ten mystery receivers:

PlayerReception PointsYardage PointsTD PointsPoints per Game
Receiver #1441018
Receiver #264616
Receiver #368014
Receiver #44.55.5212
Receiver #545110
Receiver #62338
Receiver #71146
Receiver #81.51.514
Receiver #91102
Receiver #100000

Now, a crucial thing to understand is that all of these individual factors will regress, too. Remember, regression is a basic mathematical observation; everything regresses. But not everything regresses at the same rate.

Let's say that receptions are the "stickiest" stat, meaning they remain the most consistent from one sample to the next. As a result, maybe we only expect each receiver's reception total to regress 20% of the way to league average.

Yardage would be the next-stickiest stat-- it's not as stable as receptions, but it's still fairly stable. Let's say it regresses 50% of the way to league average. Touchdowns are rather famously the least-sticky statistic; let's say they regress 80% of the way toward league average.

Regressing each component individually doesn't change the order of receivers within that component. We expect the receiver with the most receptions to this point to have the most receptions going forward, and the receiver with the fewest receiving yards right now will probably have the fewest receiving yards over the next month.

But while the order of each component doesn't change, the combination of all three subcomponents tells a very different story. Calculating our expected regression for each component individually, this is the list of expected points per game going forward.

PlayerReception PointsYardage PointsTD PointsTotal Points
Receiver #35.45.851.0412.29
Receiver #25.43.852.2411.49
Receiver #13.83.853.0410.69
Receiver #44.24.61.4410.24
Receiver #53.84.351.249.39
Receiver #62.23.351.647.19
Receiver #81.82.61.245.64
Receiver #71.42.351.845.59
Receiver #91.42.351.044.79
Receiver #100.61.851.043.49

Now we're getting somewhere! Receiver #1 was living almost entirely on touchdowns. He was only tied for fourth in receptions and receiving yards. Because touchdowns are the most unstable predictors, his expectation falls the most (7.31 points per game), leaving him 3rd in expected points per game going forward.

Meanwhile, Receiver #3 was a reception and yardage monster who hadn't reached the end zone yet. Positive regression on touchdowns almost offsets the negative regression on receptions and yards and we only expect his per-game scoring to fall by 1.71 points. This lets him leapfrog Receiver #1 and #2 into first place.

Because the order of the list has been shuffled, we now have specific actions we can take to leverage this understanding. We can trade Receiver #1 for Receiver #3. Or maybe we trade him for Receiver #4 (who is quite nearly as good in expectation) and try to pick up an extra piece in the process. If we have both Receiver #7 and Receiver #8, maybe we bench #7 and start #8 this week as we expect the standings to change.

To summarize, here are our three guiding principles, the North Star of all future analysis:
Principle #1: Everything regresses to the mean.
Principle #2: Not everything regresses at the same rate.

Principle #3: Not everything has the same mean.

Rather than looking at fantasy points directly, we're going to break those fantasy points into their component parts-- pass/run ratio, yards vs. touchdowns, per-touch efficiency, and so forth. 

And then we're going to pick the most unstable of those components, put all of the leaders in one bucket ("Group A"), put all of the worst performers in another bucket ("Group B"), verify that Group A is outperforming Group B in the thing we care about (typically fantasy points)... and then predict that entirely through the magic of regression, Group B will outperform Group A going forward.

Maybe the idea that we can predict future performance without so much as knowing the names of the players involved sounds like magical thinking, but we've been doing just that for seven years now and these predictions have a 76% lifetime success rate. If history is anything to go by, we'll make around 8 predictions this year, and around 6 of them will be correct.

We'll start with our first prediction next week using one of the most popular-- and most predictively useless-- statistics in football.

 

Photos provided by Imagn Images

More by Adam Harstad

 

Dynasty, in Theory: Do the Playoffs Matter?

Adam Harstad

Should we include playoff performances when evaluating players?

01/18/25 Read More
 

Odds and Ends: Divisional Round

Adam Harstad

Examining past trends to predict the future.

01/17/25 Read More
 

Odds and Ends: Wild Card Weekend

Adam Harstad

Examining the playoff futures and correctly predicting the Super Bowl winner.

01/10/25 Read More
 

Dynasty, in Theory: Evaluating Rookie Receivers

Adam Harstad

Revisiting this year's rookies through the lens of the model

01/09/25 Read More
 

Dynasty, in Theory: Consistency is a Myth

Adam Harstad

Some believe consistency helps you win. (It doesn't.)

01/04/25 Read More
 

Odds and Ends: Week 18

Adam Harstad

How did we do for the year? Surprisingly well!

01/02/25 Read More