Archives de Colby Cosh

Main Index Page
About Your Host
Send Me E-Mail
Browse the Archives
Read My Work

Leftovers
An unscientific exercise in parsing hockey statistics.

As you know I sometimes do, I've been fiddling with stats tonight. All of the work of a hockey team--understood as a team rather than a business enterprise--goes into winning games, which is to say it goes into scoring and preventing goals. Broadly speaking, teams have better records as they do better at scoring and preventing goals. In fact, there is a pretty solid linear relationship between goal differential (goals scored minus goals allowed) and points in the standings. Real solid. If you make a scatter diagram of the two things, the points line up nicely and you can see that goal differential can "explain" (or perhaps more properly, "account for") most of the variance in standings-points. More than 95% of it, in fact, for the whole 2002-03 season, if you go by the traditional R² measure.

This suggests that adding a certain number of goals to a team will yield a highly predictable number of points in the standings--about 2.8 goals per point, as it happens (and the figure is almost exactly the same so far this year). If you ignore the extra points from overtime losses, which I'm not recommending except for the purpose of grasping the idea here, your favourite team should be X points above .500, where X is goal differential divided by 2.8. And most are. Detroit right now is +49; +49 divided by 2.8 is 17.5; so they should have 55 + 17.5 = 72.5 points after the 55 games they've played so far. They actually have 70. My Oilers are -5, so they should have about 52 points through their 54 games. True total: 51.

But some teams diverge pretty wildly from the levels of real success predicted by their ability to score and prevent goals. The most dramatic case in this season is Ottawa. As I write these words, Ottawa has very nearly the best offence in hockey and very nearly the best defence. They have scored 61 more goals than they've allowed. 61! Nobody else is close to that! Detroit is in second, with 49. So Ottawa should be at the top of the league standings.

But of course they're not. Having played 53 games they should have about 53 + (61/2.8) points, or 75. If you account for overtime losses, it should be more like 78. In fact they only have 66 points, which leaves them behind division-mates Toronto (69 points, goal differential +20) and Boston (67, +10).

No team in the NHL has remotely as much negative "residue" as Ottawa does in this sense. They are about 12 points lower in the standings than their GF/GA totals would predict. Why has this happened to Ottawa? Easy answer for anyone who's studied baseball sabermetrics: it's because, in the real world, not every goal is worth 1/2.8 of a point in the standings. Goals in blowouts are worth less, and goals in tight one-goal games are worth more. Put simply, Ottawa has been very very bad at scoring the high-leverage goals in tight games. How do we know? They have a shockingly bad record in one-goal games. Right now that record is 4-13, counting overtime losses as "losses" and forgetting ties.

I haven't checked to make sure that residue is connected to one-goal performance up and down the league. I know Pittsburgh and Nashville have the most positive "residue" in the league (7.0 and 7.4 points respectively), and they do much better in close games than in others, Pittsburgh being 9-11--which is pretty good for a team that only has 11 wins of any kind!--and Nashville being 15-8. There is really nothing else the residue can be but a reflection of performance in close games. For your interest, here's a list of the top and bottom five in residue, as of this evening.

POSITIVE
Nashville  +7.4
Pittsburgh +7.0
Boston     +5.6
Toronto    +5.1
Dallas     +4.9

NEGATIVE
Ottawa    -11.5
Detroit    -5.3
Chicago    -4.4
Edmonton   -4.0
Minnesota  -3.8

The important question to answer is this: what is the right name for this residue? What actual factor makes teams perform worse or better in close games than "mere" goal-scoring and goal-preventing would allow for?

The temptation for a student of baseball statistics is to assign the residue to luck. This is not something that's done fancifully. If the residue isn't luck, then it has to be a reflection of some ability or quality on the part of the team: "guts" or good coaching, perhaps. But if that were so you would expect the same teams that had high residue last year to, by and large, have it again this year. Most of the teams have the same coaches, after all, and largely the same personnel: the effect, whatever real quality was causing it, should have some persistence--if the quality were real. (Warning: this form of argument may be more philosophically controversial that it's normally taken to be in baseball.)

I can tell you that, between '02-'03 and this year, it hasn't. A scatter diagram of the residue in both years is a random cloud of bugs, without apparent meaningful correlation. Of the 14 teams that had positive residue last year, only 8 have it now. Of the 16 negatives, only 9 still have it. Better than chance--but only a teeny tiny toony bit better.

The residue didn't seem to have any predictive quality in the playoffs, either, though you might expect good performance in close games to be very important to playoff success. The four semifinalists--New Jersey, Anaheim, Minnesota, and Ottawa--were ranked 9th, 5th, 13th, and 21st in the league, respectively. When you consider that low-residue teams were less likely to reach the playoffs in the first place (i.e., they did poorer in the standings than they ought to have), this is less than impressive.

My guess--and given that I'm an amateur statistician, it can be only a guess-is that the residue really is a product of luck. If so, outliers like Nashville and Ottawa should be far more likely to rebound to their "true" level of quality than to persist in very good or bad luck. Nashville is seven points above .500 despite being outscored 128-137 on the year; is this truly tenable? My very strong suspicion is that it isn't, and that they will collapse somewhat. It's certainly apparent that Ottawa is a great team playing in some kind of black cloud; I believe they are an extremely strong bet to overtake Toronto and Boston, and take the place every reasonable fan assigned them at the start of the season on grounds of talent. (Plus, this: no team at all had a negative residue of -11 for the whole season last year. League worst was San Jose's -5.2.) Show of hands: who thinks the Leafs are really, really better than the Sens? Congratulations, you have just self-diagnosed Down's Syndrome.

But maybe it's not entirely luck. If you can think of something real and specifiable that the positive-residue teams have in common, something which the negative-residue teams lack completely (or vice versa), you have the key to an especially important factor in high-leverage hockey situations. Hide it well, and get thee to Vegas! I'd be willing to entertain the suggestion that the Senators are coached by a jackass, after last year's playoff near-fiasco with Spezza, but in '02-'03 the jackass had the team playing almost exactly at its level of goal differential (-1.7).

What we have here is a potential framework for the equivalent of what is called "Pythagorean standings" in baseball. At the end of a season, you could identify the teams that had especially bad luck and pick them to improve. With what exact confidence could you do this?--I don't know.

Looking at last year, the "unluckiest" five teams were San Jose, Nashville, Buffalo, Dallas, and Vancouver. Not all bad teams, certainly!--just the unluckiest by this measure. How many have better records so far this year? Four--Vancouver's record is slightly better--and Dallas's decline is no surprise on other grounds (and is more profound than the standings show, if you believe all this tommyrot).

Last year, the luckiest five teams were Atlanta, Florida, Tampa Bay, Edmonton, and Anaheim. How many stand worse now? Only three: Florida and Tampa have improved. Atlanta's only slightly worse, despite being a very special case. So maybe the predictive value of this "luck" isn't so great on its own. Bill James used the baseball equivalent as one weighting factor among a whole set of pre-season indicators, and team age could certainly be another factor you could use. Then again, maybe age enters into what we're calling "luck" here. Sadly, hockey statistics don't exist in the readily-manipulable forms in which baseball's stats can be found, so checking such things as these would be a full-time job, and not one, really, for the likes of me. -February 4, 2004

The G Spot

I guess the best way to introduce this unnerving discovery is to retrace the steps that led to it. It starts with me trying to decrypt the meaning of last year's NHL playoffs--but before we can start to discuss it, before you can share the same weird concepts I think with, I have to show you how I treat hockey statistics.

The secret to winning hockey games is no secret at all: you have to score more goals than the other guy. Over a large number of games, the teams that perform better in the standings outscore their opponents by more. There's a strong, simple linear relationship, which I've discussed before: at the current offensive levels, every 2.8 goals you add gain you about a point in the standings over time. So goal differential--your goals scored, less your goals allowed--is an important stat. On its own (and to put it roughly), goal differential determines about 90% of your place in the standings.

Gradually I've come to think of teams in terms of their goal differentials almost as much as I think of their actual standings points. Detroit right now is a +61 team. My Edmonton Oilers are +5, and not surprisingly they're around .500. Calgary is a little better (+9) both in the standings and in goal differential. Pittsburgh is an abyssal -120; they're actually much worse than the standings show them to be.

If you understand that, you shouldn't have any trouble with the next step, which is realizing that you can distribute team performance among various aspects of the game. Just for starters, you can break it down between offence (meaning goal-scoring) and defence (meaning goal-prevention). Right now, this season, the average NHL team has both scored and given up 174 goals. Some of the good teams have below-average offences but great defences, like--and this will not surprise a hockey fan--New Jersey (scored 169, given up 137). Conceptually you can break New Jersey's +32 goal differential into a -5 for offence and a +37 for defence. Phrased in English, the Devil offence is just average but the defence is outstanding. You could say the same of San Jose (+2, +26). Similarly you can point to a team like Toronto (+22, +2) as being outstanding up front but weak at preventing goals.

But you can break offence and defence down further. Let's concentrate on the latter for a moment. We know how many shots on goal each team has given up in the season, and we know how many goals its goaltenders have surrendered on those shots--that figure is just one minus the save percentage. So, if you believe that save percentage is an accurate indicator of a goaltender's performance--most sportswriters, commentators, and goaltenders now seem to think so, and so do I--then you can allocate defensive goals prevented between the team's defence (including backcheckers) and its goalies. A sample calculation, taking Dallas for our example:

How many goals did Dallas's defence prevent, compared to the league average? +26

What's the average save percentage of the league? .909

What's the save percentage of Dallas's goalies? .906

So Dallas's goalies have given up 3 extra goals for every thousand shots... how many shots have they faced this season? About 1,540 (the SOGA numbers I pull off ESPN only go to three significant digits, but that's fine)

In other words, the Dallas goalies have given up how many more goals than league-average goaltenders would? About five. They're a -5 on their side of the goal-prevention accounting.

And since Dallas overall is +26 at goal prevention, their defencemen must be? +31. Which is just about what you get if you do the calculation the other way, assigning the defencemen credit for shots on goal prevented above league average (about 330) and count that as being worth the same number of goals a league-average goalie would let in (league-average save percentage is .909, so the average goaltender lets in .091 goals for every shot--on 330 shots, ta daaa! That's thirty goals.)

This is an elaborate but, I think, undeniably effective way of distributing credit for goal-prevention between the defence and the goaltenders. Every hockey writer on the continent will tell you that Dallas has a marvelous defensive scheme but that the team's been let down slightly this year by Turco. If you recall that 2.8 goals translate to about one point in the standings, you can say the same thing in quantified form: the Dallas defencemen dragged the team 10 extra standings points past .500, but Turco and his backups gave a couple away (and, in fact, it's mostly the backups--Tugnutt's been kinda lousy in relief).

And you can do the same thing for the offence, though the meaning of it would be less clear... maybe. A team like Anaheim generates a large number of shots (29.7 per game, about two above average) but can't put the puck in the net (shot percentage of .076, markedly less than the .091 figure I cited a couple paragraphs back). There are teams like Atlanta that don't create many shots (25.9/g) but convert on a huge fraction of them (.104). What does it mean? In Atlanta's case, I'm inclined to attribute it to Ilya Kovalchuk and some other wingers having great years as snipers. And I notice that over in Anaheim, Petr Sykora is throwing a ton of shots at the net to no great effect. As far as a general interpretation of the figures for "accuracy" and "shot creation" goes, I'm at a loss, but that's all right--we're going back to goaltending.

Goaltending is important in the playoffs. Yeah, yeah... we know that. But have you considered just how important it might be?

We've found a way to make a concise statement about the number of goals saved above average (or below average) by a team's goaltenders over the course of a season, or any convenient length of time. Now I'm going to show you a version of the chart that got me thinking. It's the teams that made the playoffs last year, sorted according to their "extra goals prevented by goaltenders" figure for the preceding 2002-03 season as a whole. Nothing else.

Minnesota  +42
Anaheim    +30
Philly     +28
Dallas     +28
Colorado   +23
New Jersey +18
Detroit    +18
Toronto    +18
Ottawa      +9
Washington  +9
Tampa Bay   +7
Vancouver   -2
NY Isles   -14
Edmonton   -19
Boston     -19
St. Louis  -33

Unless I'm completely nuts, you probably noticed, like I did, that the two completely disregarded teams which went on legendary, eye-popping playoff tears were the ones that had the best goaltending statistics in the regular season. And notice that the eventual champion, New Jersey, had the second-best numbers in its conference; through good luck, it didn't have to face a better goaltending team until the league final.

In fact, if you go back and check, you'll see that the team with the "better" goaltending by this measure won 10 of the 15 postseason series. But the truth is more remarkable than that: the five "upsets" involved teams that were behind their opponent in this category by 2, 10, 9, 12, and 12 goals. Nobody in the whole postseason was able to overcome a margin greater than 12; there were six such series (all in the first round except Det/Ana, TB/Was and Phi/Tor, plus Van/Min in round two) and they were all "decided" by "goaltending" in this sense.

As the difference by this measure gets stronger, the effect gets more reliable: with any advantage in goals saved, teams won 10 of 15 series (67%), but with a five-point-or-greater advantage, they won 9 of 13 (69%); with 10 or more they won 8/11 (73%), and so forth.

So I checked the previous year's playoffs, and it had happened again, though the change in the curve was less dramatic. Keep in mind this is a small sample space, and it would be a lot of work to make it bigger. For the two years combined, teams with any advantage won 19/29 (66%); with a five-goal advantage it was 17/26 (65%), a slight dip, and at 10 it was 14/21 (67%)--but at 15 it was 12/15 (80%), at 20 it was 10/12 (83%), and at 25 it was 9/10 (90%). The data seem to be pointing to some sort of sinusoidal relationship: any advantage is important, and your chance of winning approaches 100% asymptotically as the size of the advantage gets greater.

Is this happening because better teams normally have "better goaltending" by this measure? There's no correlation between "goals saved by goaltenders" and goals saved and scored by everyone else: I checked. Actually, it's a modest negative correlation, probably because teams which give up more shots give their goalies a better chance to do well, aggregately, in this stat.

Moreover, you don't see this "sinusoidal" shape when you compare disparities in "non-goaltending goal differential" to the chance of winning a playoff series. The effect of superiority in respects other than goaltending, amazingly, seems to get smaller as the advantage gets greater. This is the mindblowing part of this whole turgid eructation, here. Teams with a non-goaltending advantage of zero goals or greater won 19 of 29 series over the two years (66%). With ten extra goals, they won 15 of 24 (63%). At 20, it was 11/19 (58%); at 30, 8/15 (53%), at 40, 5/10 (50%), or, in another words, a crapshoot.

Teams that were sixty goals better than their opponents in categories other than goaltending won only two of six series over the two years. Just to clarify matters, look at those actual series--the four that went the other way were huge upsets. One was Anaheim's shock victory over Detroit in '03, and one was its second-round defeat of Dallas. One was Minnesota's win over Vancouver in the conference semifinal last year. And one was Montreal's surprise defeat of Boston in the '02 playoffs. All of them teams riding a hot goaltender to victory.

[UPDATE, 10:10 pm: This paragraph is slightly newer than the rest and has replaced some mystified head-scratching.] So why would teams be more likely to lose playoff series as their edge in non-goaltending categories gets greater? Are the data telling us that being a better team in non-goaltending respects is at an active disadvantage? No--you have to remember (as I failed to at first) that the playoff-qualifying teams are ones who came up to a certain base standard of quality in the first place. Their overall goal differential basically has to be greater than zero, because about half the teams make the playoffs. So teams with a huge non-goaltending advantage are more likely, because of this selection bias, to encounter a team with much better goaltending. But the data do seem to be telling us that goaltending (or whatever the "goaltending" metric measures here) becomes vastly more important in the playoffs--because teams with great goaltending should be more likely to meet teams with a huge advantage in other areas. And they win most of the time anyway.

So, if you've absorbed all that, and it's my fault if you haven't, you're probably wondering how the 2003-04 teams shape up in the apparently-insanely-important "extra goals prevented by goaltenders" category. This is the list.

San Jose   +33
Florida    +32
Minnesota  +25
Montreal   +20
Boston     +20
Colorado   +18
New Jersey +18
Anaheim    +11
Vancouver  +10
Detroit     +6
Calgary     +6
Philly      +5
Ottawa      +1
Columbus    +1
Tampa Bay   -1
Nashville   -5
Dallas      -5
Islanders   -7
Carolina    -7
Washington  -7
Toronto     -9
Buffalo     -9
St. Louis   -9
LA Kings   -10
Edmonton   -10
Chicago    -16
Phoenix    -16
NY Rangers -20
Atlanta    -22
Pittsburgh -49

What conclusions do I draw from this data about the upcoming playoffs, assuming the numbers hold pretty steady to the end of the season? I would suggest to you (but don't take this to the bank yet) that:

San Jose is practically bulletproof within the Western Conference against anyone but Minnesota;

Montreal and Boston are likely to impress, and one of them should reach the conference final;

Good teams likely to leave fans very disappointed include Toronto, Tampa, Ottawa, and Philly.

And a sad postscript: with the changes in the rules planned for next year, a concomitant shattering of statistical norms is likely, and so this research is likely to be of little use beyond June, if it's useful at all.

- 7:50 pm, March 7 (link)

[Return to the main page]