So let's say we've figured out that a player has been lucky or unlucky - what do we do with that information? The answer lies in the concept of Regression.
So WHAT IS REGRESSION?
To explain that, let me give you an example:
Imagine a fair coin - a coin that 50% of the time comes up heads and 50% of the time comes up tails. Now let's say you flipped that fair 20 times and 19 times it came up heads. In other words, the coin flipped heads 95% of the time.
Now what if you flipped that coin 80 more times? Well we'd expect that fair coin to come up heads 40 times and tails 40 times. Now in our 100 flips of that coin, we have a total of 59 Heads and 41 tails. In other words, that coin flipped heads 59% of the time.
Now what if you flipped the coin 900 more times? (For a total of 1000 flips). Well we'd expect 450 more heads and 450 more tails, to give us 509 Heads and 491 Tails total. Now we have a coin that flipped heads 50.9% of the time.
Notice what's happened here: we started with a coin that had an extremely unlikely heads % - 95%. As we increased our number of flips, that number declined and declined and became closer to the true heads % of that coin (50%), going from 95% to 50.9%.
That decline in this case is what we mean by "regression" - the change in a result over time to get closer and closer to the true result that we'd expect. Not that regression always has to be a decline - if our coin started with a 5% heads percentage, we'd expect it to regress UPWARDS toward 50% just as well. In other words, while the word "regression" sounds negative, it really isn't - unlucky players will regress positively just the same as lucky players will regress negatively. Regression essentially says just this:
Luck, good or bad, cannot continue forever, and a player's results will over time come back to where we'd expect if his luck was neutral. Note: this does not mean that if a player is lucky (or unlucky) we should expect an unlucky (or lucky) streak in the future.* Remember the coin flip example above - the coin's heads percentage didn't regress because it's luck reversed; rather it regressed because the lucky results were - over the long run - overwhelmed by the coin's own true heads %.
*This concept is in fact a well known fallacy known as the gambler's fallacy - the same fallacy that causes gamblers to continue to bet on a number coming up in roulette because it's due.
To put this in Hockey terms, if a player is getting lucky and every shot is going in the net for the first two weeks of the season, we shouldn't necessarily expect a slump - rather we should expect him to start scoring at his normal rate for the rest of the season (which might look like a slump next to his lucky scoring rate, mind you).
Okay so we should expect regression when we find evidence that a player's results have been aided by luck. But that raises another question:
WHAT DO WE EXPECT A PLAYER'S RESULTS TO REGRESS TO?
You may have heard before that we expect "Regression to the Mean" - "Mean" being the statistical term for "average". But what average? The average of all players?
No, rather we expect a player's statistics to regress toward his "true talent." What do we mean by true talent? Well essentially we mean that each player, if he played in a luck neutral environment, would obtain certain results because of his own skill.
To go back to our coin flipping example above, the true talent of a fair coin is 50% - if the flipper was neither lucky nor unlucky, he'd get a heads 50% of the time. And indeed, as we flipped the coin more and more times, the heads% did in fact regress toward the true talent of the coin - getting closer and closer to 50%.
But our hypothetical coin is unlike hockey players in one especially important way (other than, you know, being a coin instead of a human being playing hockey) - we'll never be able to say with 100% certainty what the true talent of a hockey player (or team) is at a given time.
This poses a problem - if we can't be certain of a player's true talent then how do we know what to regress a player's stats to (or that they're lucky at all?) The answer is simple - we can't be certain of a player's true talent at any given point, but we can make a good estimate of a player's true talent from two sources:
1. That player's past results
2. The results of the average hockey player at his position.
Let's go over each of these sources individually:
A. USING A PLAYER'S PAST RESULTS TO ESTIMATE HIS TRUE TALENT:
As you might expect, a player's past results are often the best way to determine a player's true talent and what to expect his current #s to regress to.
Let's take an example:
Matt Moulson is a 14.6% career shooter in the NHL - a particularly high percentage for an NHL player. We know this because he's done this for 3 seasons already with the Islanders and because he showed off a similarly high shooting % in the AHL when he was property of the Kings (When we say "Past Results" we don't just mean NHL results - if we have results from other leagues, we should use them. More Data is always better than less Data.)
If Matt Moulson begins next season in a slump and shoots 6% for the first 10 games of the season, it's pretty likely that he has been getting unlucky (and there are a few ways to determine if this is the case, which we'll try getting into in the next post). So we'd expect his shooting percentage to regress toward his 14% career rate - which it will if, as expected, he manages to put up a 14% shooting percentage over the remaining 72 games of the season (in fact, assuming Moulson is shooting at a similar rate to how he does usually, we'd expect his shooting % at the end of the season to be around 13.3%).
The same would apply if Moulson began the season on a tear and was shooting over 20% over 10 games. No one manages to keep a shooting % that high over the course of an 82 game season, so we'd be pretty sure Moulson was getting a little lucky. And we'd expect Moulson's shooting % to decline back down to 14% as the season went on.
B. USING THE AVERAGE RESULTS OF PLAYERS OF THE SAME POSITION TO ESTIMATE TRUE TALENT:
Of course there are a few real problems with using a player's past results to estimate what his true talent is and what we should regress his current numbers to. The two most important issues are of course:
1. Sometimes you don't really have a large sample size of past results to use to estimate true talent; and
2. Players don't stay the same over the course of their careers.
The first issue is a pain, especially with rookies (though it applies to non-rookies to). Even if you can tell if a player is getting lucky/unlucky, how do you tell what his numbers should regress to when you have nearly no useful past results at all (CHL results not withstanding).
Again, Let's take an example:
Nino Niederreiter had a 1,4% shooting percentage last year. This is undoubtedly in large part the result of a large amount of bad luck - no one shoots that poor over an NHL season as a forward, especially not someone with the pedigree and CHL results of Niederreiter. So we expect his shooting percentage to increase next year, even if he did remain the same player as last year. But what to? How can you tell?
See the issue? We can't really assume Nino will regress to the CHL shooting percentage (the gap between juniors and the NHL is a lot bigger than the AHL and the NHL).
There is a solution to this and it involves using the results of the average player. Since we can't know whether the player in question is unique in certain ways, we assume that the player isn't unique until we get more data. So returning to our example, we'd assume that Nino is more likely closer to a 10% shooter (the average rate) than a 3% shooter. This isn't to say we throw out the past-luck-affected-results of the player - we adjust our expectations based upon the average player using these past results. So in Nino's case, perhaps we assume Nino is more likely to be a slightly below average shooter - let's say 9% - since at least some of his 1.4% season reflects his true talent - than to be an exactly average shooter (10%) if his true talent doesn't improve.
The Second Issue of course is that players change as their careers progress in the NHL. Rookies like Nino Niederreiter usually get better as they gain a few years of experience, till they hit their peaks in their mid-late 20s. Then these players begin to decline. If we regress a player's lucky or unlucky numbers to their career averages, we'll be going to far in one direction as some of the change isn't due to luck but due to a player's true talent changing.
We have two ways of dealing with this. The first is that we have some statistics which can be used to show when a change in results is not necessarily caused by the influence of luck - but may in fact be a change in how the player is actually performing (skill).
The second way may be more controversial for people - we take the average aging pattern of similar players and use it to adjust the player's career #s in our estimation of that player's true talent. So if we see that the average say 32 year old forward starts to shoot about .3 shots less per game (which means about 25 or so less shots), we might expect that player's true talent at fielding shots on goal to decline by a similar amount, and thus we would regress a fluky shots on goal total toward that adjusted true talent.*
*I'm totally making up this number out of thin air, because I'm writing this in an airplane and can't look up any factual aging curve information. But the concept remains true.
This isn't to say that all players age the same - every player is at least slightly different. But aging patterns do tend to be similar for players and thus they're useful even if not definitive. And they're useful when trying to regress a player's results to adjust for growth/decline in that player's true talent.
Finally, this brings us to our last question:
WHEN DOES REGRESSION TAKE PLACE?
You may have noticed in the above post that I referred to some statistics regressing over the course of a season (in the Matt Moulson example) and I referred to some statistics regressing in a future season (in the Nino example). When should we expect regression?
The answer is that it varies: luck can sometimes last a while - even the course of a whole season (heck for some statistics, it can last even for multiple seasons). This is because an 82 game season doesn't really give us a large sample size for some statistics. For example, Niederreiter only put up 74 shots on goal in a near full season last year. That's not a lot - if 4 of those shots deflect differently and go in, well all of a sudden he's got an (above?)-average shooting percentage. Clearly luck can still be influential even though it is a "season" worth of data.
Really, while luck usually doesn't last over the entire season, it can for certain statistics (Shooting % is one). That said, if you can tell that a player is being lucky through part of a season, we should always expect for that player's staistics to regress over the rest of the season. It might not always happen, but it is the most likely thing to happen. Luck does NOT last forever, even if it can last for a little while longer at any given point.
And that's it for our basics on Regression and True Talent. Next time we finally step outside the basics of luck and regression and move back to how this applies to hockey.
The Intro to Hockey Analytics/Advanced-Hockey-Statistics Primer so far:
Part 1: - What is the field of Hockey Analytics and Why Might You be Interested?
Part 2.1: - The Importance of Context Part 1 - Time on Ice
Part 2.2: - The Importance of Context Part 2 - Evaluating the Difficulty of Certain TOI through QUALCOMP and Zone-Starts
Part 2.3: - The Importance of Context Part 3 - Evaluating (and Compensating for) the Effect of Teammates via QUALTEAM and Relative Measures
Part 2.4: - The Importance of Context Part 4: The Concept of the Replacement Level Player
Part 3 - The Perils of Sample Size
Part 4.1 - Introduction to Hockey Analytics Part 4.1: Possession Metrics (Corsi/Fenwick)
Part 4.2 - Introduction to Hockey Analytics Part 4.2 - Possession Metrics: The Various Forms of Corsi Available on Hockey Sites
Part 4.3 - Introduction to Hockey Analytics Part 4.3: Possession Metrics: Fenwick a Measure of Effective Possession
Part 4.4 - Introduction to Hockey Analytics Part 4.4: Possession Metrics: Scoring Chances
Part 5.1 - Introduction to Hockey Analytics Part 5.1: Evaluating Neutral Zone Play: Zone Entries
Part 6.1 - Introduction to Hockey Analytics Part 6.1: Luck and Random Variance: An Introduction.