Filed under:

# Exploring Data: Blending Expected Goals and production among Islanders scorers

A processed way to use exploratory analysis to answer questions applied to the New York Islanders

In an effort to get some good ideas on what to write about this month, I again turned to Twitter. And once again, there are a bunch! In a series of pieces that will go live this month, I’ll attempt to answer your questions in four steps: examining the question, gathering the data, exploring the data, and then a final summary, which aims to create insights & ask further questions.

This process is generally how I approach all types of analyses, so hopefully, this is of interest! The first question is below:

### The Question

So, admittedly the intent of this question was literally about the weight of the players, but I actually took a lot of interest in the word “efficient” when tied to expected goals. This can be approached in a few different ways: who scores the most with the on-ice chances they get, who is on the ice for the most expected goals outright with the minutes they get, or on-ice expected goal share (the percentage of expected goals a player is on the ice for).

Expected goals, generally, are the probability that a shot attempt becomes a goal. This is based on a few different components: the type of shot, the coordinate from where a shot was taken, the shooter, the goalie, and other contextual situations. Each of the major expected goal models have differences in their components, so there are some slight discrepancies between them. For the purposes of this piece, we’ll be looking at Natural Stat Trick’s.

The other thing to note is that expected goals (or xG), are a descriptive metric. That means that it will tell you the probability of *what* already happened, but won’t tell you the likelihood of that same event happening again. It’s an important distinction that needs to be considered.

Between the three options above, I think when we talk about efficiency, it’s important to look at who has success with expected goals, layering production on top of that. So we’ll be exploring who has produced the most relative to the highest share of on-ice expected goals. In other words, the question boils down to “which players have a high share of expected goals and produce on the scoresheet?”

### Gathering The Data

The first thing we need to look at which players are on the ice for the most expected goals. Because of differences in ice time, it is best to use “rate metrics” for this, which normalize the amount of on-ice expected goals for every hour they play. If we do that, we’ll be able to get a list of expected goals for totals by player.

It also makes sense to get a sense of which players are on the ice for the most expected goals against. When we talk about efficiency, it’s not just about tying the production to offense but the defensive side of the puck as well. In other words, the share of expected goals should matter here. This will ultimately give us our first view, which will chart out the players that fare well in only expected goals for, only expected goals against, both, or neither.

Finally, as a way to measure production, we can go about this in two ways. The obvious thing to look at is points, but let’s get a little more granular. While all points are important, primary points (goals + primary assists) have shown to be more repeatable. Specifically, it’s the idea that a player either scored a goal or (almost assuredly) directly contributed to a goal. Like with expected goals, we will want to normalize this to a per hour basis.

For both of these, it will be important to set a minimum parameter here so we avoid outliers. Since there’s been 44 games this season, let’s choose a 10 game minimum for these purposes. One other consideration: the expected goal metrics will be pulled for 5v5 score-adjusted play only, while the primary points will be pulled at normal 5v5 play..

Luckily, Natural Stat Trick has these metrics easily accessible, so let’s pull from there and see what we find.

### Exploring The Data

By now, we have a table with six columns: the player, games played, expected goals for per hour, expected goals against per hour, expected goal share, and primary points per hour.

The first step is to explore the data through scatter plots (using the games played minimum we set above) so we can quadrant out and group players together relative to the team. Let’s try this for expected goals for versus expected goals against.

I reversed the y-axis so that lower expected goals against (good) would appear higher, such that the players who have strong rates, both for and against, would appear in the top right quadrant. This presents a chart that appears to have negative correlation, but if we do reverse it we can actually see a positive correlation. In other words, at least for the Islanders, generally if a player has a low expected goals for output, they will also have a low expected goals against output. This is true the opposite way as well. Only Tom Kuhnhackl and Noah Dobson look to be somewhat of outliers here.

What does that mean in terms of the share of expected goals?

This is easier to read but specifically shows, in “heat” form, the Islanders in order from “best” to “worst” in terms of expected goal share. Unsurprisingly, we can see Noah Dobson at the top and Tom Kuhnhackl at the bottom, which makes sense given what we had already uncovered with the above scatter plot.

Given the color scheme, Leo Komarov could be grouped towards the bottom (and if we look above, we can see his name is a bit off the trend line). The same is true of Jordan Eberle, who also further to the right than any other player (this is because he leads the team in on-ice expected goals for per hour).

Now that we have a basis of who has strong expected goal metrics, let’s tie this together by looking at another scatter plot, charting expected goal share against primary points. This should give us our best view of efficiency.

### Insights & Next Steps

We got a bit lucky, because the best answer to this question, Mathew Barzal, seems pretty obvious based on the chart. That said, let’s hone in a little bit on what we see.

If we look at the top right quadrant, we can see Barzal, Anders Lee, Jordan Eberle, Matt Martin, and Ross Johnston all safely in the quadrant. These are the players that, relative to the team, have the best results in primary points per hour and expected goal share. Barzal is over half a primary point per hour better than his teammates, which highlights his efficiency of production despite not leading the team in expected goal share.

In the top left quadrant, the player who sticks out is Brock Nelson. Nelson has been a good producer for the Islanders, but has a lower expected goal share than quite a few of his teammates. If we were to define efficiency as simply “making the most of his chances,” Nelson would have a bit of a case.

As expected, we can see some players we called out earlier — Kuhnhackl, Komarov, Dobson, and Eberle each sticking out at the four corners of the graph. This makes sense, given their expected goal share results, and is a cool finding from the above analysis.

Finally, as stated above, these views simply show the “what” of what has happened for the Islanders this season. It does not explain the “how” or the “why,” which are essential to understanding the process of which leads to strong results in xG and production. There are other factors, such as raw skill, system fit, and quality of teammates that play a role as well.

But this type of analysis starts can be a big help at diagnosing the next set of questions such as:

• Why is Brock Nelson’s expected goal rate so low, and if it was higher would he produce more?
• Is Jordan Eberle’s goal slump simply a matter of bad luck, seeing as he is on the ice for more expected goals per hour than any other Islander?
• Is Noah Dobson’s early success a precursor to his development as a potential top-pairing defenseman?
• What’s made Matt Martin and Ross Johnston so successful this season?

Many of these can be looked at by the coaching staff, but these exploratory insights are the first step in understanding data and moving to solving relevant questions.

All metrics from NaturalStatTrick.com, and all charts were last updated as of after Monday’s game against the New York Rangers.