Mathematical Analysis of Hockey Data and Statistics

Czech Your Math

I am lizard king
Jan 25, 2006
5,169
303
bohemia
This is meant to be a resource where posters may:

- post links to and the results of studies which utilize the mathematical analysis of hockey data and statistics

- access the variety of studies which may be presented here

If you wish to give substantial feedback on a particular study, please do so in the thread for that study.

If you do not value these studies in general and/or any specific study, then this is not the place to show your lack of appreciation. Non-constructive criticism may be deleted by the moderators.

It might be best for the authors of studies to use their first (or one of their first) posts to link to some or all of the studies which they wish to share with others. Then each author can update that post upon completion of any further studies. This would allow readers to access the maximum amount of specific studies in the shortest time and with the least amount of effort.
 
Last edited:

Czech Your Math

I am lizard king
Jan 25, 2006
5,169
303
bohemia
This is a study of a fixed group of higher scoring players from 1946 to ~2007, and how they performed from season to season. It needs some improving, including more complete data, especially since the lockout, but overall I believe the methodology and results hold a lot of promise:

Improving Adjusted Scoring and Comparing Scoring of Top Tier Players Across Eras

The results may be used to A) assist in comparing offensive production across different seasons and eras, and B) see how "simple" adjusted goal/point data tends to help/hurt various seasons and eras.

This is sort of a subsequent, companion study, but totally separate methodology. It uses linear regression to study some factors which seem to most affect how the very top group of players' scoring fluctuates:

Using Regression to Adjust "Adjusted Points" for Top Tier Players '68-'12

Other studies:

Adjusted Playoff Scoring

Best "Half Seasons" Since 1994

An Estimate of How the Available Hockey Population Pool Has Changed Over Time (Focuses on Goalies and Top Line Scorers)
 
Last edited:

tarheelhockey

Offside Review Specialist
Feb 12, 2010
85,438
139,473
Bojangles Parking Lot
Awesome idea for a thread. I've thought for a long time that we could use a subforum devoted to deep analysis, but a sticky thread is a good start.

One suggestion: find a way to promote this thread in other parts of HF. There are a lot of math-y forumers who don't visit the History section.
 

Hockey Outsider

Registered User
Jan 16, 2005
9,197
14,635
Comments/disclaimers

  1. A lot of these studies are several years old. I'm posting some of them for reference purposes. As of now (July 2012), I don't necessarily agree with the methodology or results in some/many of these posts.
  2. The data used in many studies are now at least a few years out of date. I'm not sure when (if ever) I'll get around to updating them.
  3. I shudder to think about how many hours these all took... at least it's been spread over 7.5 years so it's not too bad on a yearly basis...

Analysis of Award Voting


Offensive statistics


Goalie statistics


"Applied" case studies

  • Gretzky vs Lemieux – why it helps to retire young
  • 1960s Chicago Blackhawks -- a team that struggled due to a weak supporting cast while Hull, Pilote, Hall and Mikita took too much blame
  • Marcel Dionne – detailed analysis -- a few bright spots, but generally as bad as you would expect
  • Tony Esposito – it’s not his fault his teammates couldn’t score
  • Bill Durnan -- why does he have such a poor reputation in the playoffs?
  • Henri Richard – it looks like Harvey & the first-line scorers, rather than Richard, improved the most in the playoffs
  • Joe Sakic – a look at his impact on Colorado's record

Miscellaneous

 
Last edited:

Czech Your Math

I am lizard king
Jan 25, 2006
5,169
303
bohemia
Awesome idea for a thread. I've thought for a long time that we could use a subforum devoted to deep analysis, but a sticky thread is a good start.

One suggestion: find a way to promote this thread in other parts of HF. There are a lot of math-y forumers who don't visit the History section.

I'm just happy we have this thread.

Do you have any specific ideas as to how to promote this thread in other forums on HFB? I assume you mean start a thread in another forum to alert readers? If so, anyone can do that, but we might want to wait a couple/few weeks until more authors have had a chance to post links to their studies.
 

tarheelhockey

Offside Review Specialist
Feb 12, 2010
85,438
139,473
Bojangles Parking Lot
^ that's the only thing I had in mind, though admittedly it seems kind of clumsy. Maybe we can just keep an eye out for good material and suggest that it be posted here.

I agree about building up material. I have a project in progress on the Canes board that I still need to finish, and will post it here when it's ready.
 

Canadiens1958

Registered User
Nov 30, 2007
20,020
2,781
Lake Memphremagog, QC.
Geocities Links

Comments/disclaimers

  1. A lot of these studies are several years old. I'm posting some of them for reference purposes. As of now (July 2012), I don't necessarily agree with the methodology or results in some/many of these posts.
  2. The data used in many studies are now at least a few years out of date. I'm not sure when (if ever) I'll get around to updating them.
  3. I shudder to think about how many hours these all took... at least it's been spread over 7.5 years so it's not too bad on a yearly basis...

Analysis of Award Voting


Offensive statistics


Goalie statistics


"Applied" case studies

  • Gretzky vs Lemieux – why it helps to retire young
  • 1960s Chicago Blackhawks -- a team that struggled due to a weak supporting cast while Hull, Pilote, Hall and Mikita took too much blame
  • Marcel Dionne – detailed analysis -- a few bright spots, but generally as bad as you would expect
  • Tony Esposito – it’s not his fault his teammates couldn’t score
  • Bill Durnan -- why does he have such a poor reputation in the playoffs?
  • Henri Richard – it looks like Harvey & the first-line scorers, rather than Richard, improved the most in the playoffs
  • Joe Sakic – a look at his impact on Colorado's record

Miscellaneous


Sadly the Geocities links do not work generating a response similar to the following:

http://www.geocities.com/thehockeyoutsider/Hart_shares.pdf
 

Canadiens1958

Registered User
Nov 30, 2007
20,020
2,781
Lake Memphremagog, QC.
Counterpoint to Henri Richard Case Study

Comments/disclaimers

  1. A lot of these studies are several years old. I'm posting some of them for reference purposes. As of now (July 2012), I don't necessarily agree with the methodology or results in some/many of these posts.
  2. The data used in many studies are now at least a few years out of date. I'm not sure when (if ever) I'll get around to updating them.
  3. I shudder to think about how many hours these all took... at least it's been spread over 7.5 years so it's not too bad on a yearly basis...

Analysis of Award Voting


Offensive statistics


Goalie statistics


"Applied" case studies

  • Gretzky vs Lemieux – why it helps to retire young
  • 1960s Chicago Blackhawks -- a team that struggled due to a weak supporting cast while Hull, Pilote, Hall and Mikita took too much blame
  • Marcel Dionne – detailed analysis -- a few bright spots, but generally as bad as you would expect
  • Tony Esposito – it’s not his fault his teammates couldn’t score
  • Bill Durnan -- why does he have such a poor reputation in the playoffs?
  • Henri Richard – it looks like Harvey & the first-line scorers, rather than Richard, improved the most in the playoffs
  • Joe Sakic – a look at his impact on Colorado's record

Miscellaneous


Counterpoint to the Henri Richard case study with link:

http://hfboards.mandatory.com/showthread.php?t=514771&page=10

see post #241
 

plusandminus

Registered User
Mar 7, 2011
1,404
268

Good to have a thread like this. (I have suggested a couple of times that I'd like to see a section dedicated, but without any feedback.)

I'm still surprised about the apparent lack of interest for some of the studies I did (which has been commented upon before, and it got a little better during the end). As I've written before, I spent most of my free time during the last 16 or so months assembling data and doing - often advanced - studies. Finally, some month ago, I got tired and also got a more requiring working situation (and now this thread arrives). Maybe I'll return later.

For example, I studied the "with or without" effect (team results when player participating in game, vs when he didn't) for all players during all the last 25 or so seasons, lately even the goalies, also comparing (by game by game basis) to an "expected" game outcome. I also studied how some players' point production was affected when certain other players wasn't around. This is, as I see it, things that would be of great interest, considering all the debates upon how much different players were helped by each other, etc.
I've studied the strength of different year groups too. And lots on adjusted scoring, scoring distributions, the effect of faceoffs, situational adjusted goalie stats, penalty killing stats where attempting to taking away the goalie effect, different kinds of adjusted +/-, a combined overall player stat for ES+PP+SH, "how easy it was to produce points during a certain season", schedule adjusted standings, individual winning %, etc.

I find a problem with basically all studies here (including my own) in that they require a lot of work and time. There are usually many hours of boring research to have to be done, in order to learn and know about many factors leading up to the end results. There also seem to basically always be factors "biasing" things, including (of course) "randomness" or "circumstances".
(To use a common example, people often try to determine who produced more impressively between peak Gretzky and peak Mario. We can adjust based on mathematical methods, ending up with adjusted stats. But then we also want to know how much teammates affected their stats, or playing system. And in the end, we just end up with more or less arbitrary "feelings" of who did best.)
So, I think there needs to be done "boring" research in order to progress. To sort of lay foundations or reference to build upon. There are so many more or less automatic assumptions being done, and I think those too needs to be closely examined.

Edit: Sorry if this thread is meant as mainly a link thread. Anyway, I made a few suggestions for further studies, rather than link to existing ones.
 

Czech Your Math

I am lizard king
Jan 25, 2006
5,169
303
bohemia

Thanks Pnep! I'm surprised you didn't link to your famous HHOF Monitor Point study. You could add that to your first post and update the link, if necessary, whenever you update that study.

I'm still surprised about the apparent lack of interest for some of the studies I did (which has been commented upon before, and it got a little better during the end)

I'm sorry if you feel there is a lack of interest in your studies. However, the main reason for this thread is to allow readers access to a wide variety of studies from a wide variety of authors, in a relatively condensed format. I'm not sure why you are citing the lack of interest, yet not providing links to any/all of your studies. It's possible others have not seen some/all of your studies.

If you provide links to your studies, then others may be able to read them, consider them, and give you valuable feedback (preferably in the thread of the specific study) which may contradict what you perceive as a lack of interest and/or help you improving that study or others you do in the future.

If you don't wish to provide links to any/all of your studies, because you perceive lack of interest, or feel they are incomplete or unworthy in some way, then that is your choice. In any case, I encourage you to provide links to at least some of your studies. If you do so, you may limit it to those you believe will be of more interest to more readers, those best understood by most readers, and/or those of most importance to other (potential?) authors and to advancing the knowledge of hockey using math/stat analysis. In such case, you can always update your post with links to other/further studies.

I find a problem with basically all studies here (including my own) in that they require a lot of work and time. There are usually many hours of boring research to have to be done, in order to learn and know about many factors leading up to the end results. There also seem to basically always be factors "biasing" things, including (of course) "randomness" or "circumstances".

I agree and that is why I believe that the topic of, metrics and methodology of, and estimated time involved in such studies should be considered and chosen carefully. It is important that the effect being measured, and the metric used in doing so, are not likely to overwhelmed by random error and/or factors which can't be removed, quantified, and/or easily assessed.

If an author wishes feedback on a potential or ongoing study, he/she can always start a thread (and even post a link to such in this thread, while requesting feedback in the thread for the study). The thread might contain the specific results to date and could be used to receive feedback about and discuss the various aspects of the study (topic, metrics, methodologies, etc.), especially those factors which the author believes are complicating the study.

(To use a common example, people often try to determine who produced more impressively between peak Gretzky and peak Mario. We can adjust based on mathematical methods, ending up with adjusted stats. But then we also want to know how much teammates affected their stats, or playing system. And in the end, we just end up with more or less arbitrary "feelings" of who did best.)
So, I think there needs to be done "boring" research in order to progress. To sort of lay foundations or reference to build upon. There are so many more or less automatic assumptions being done, and I think those too needs to be closely examined.

Data analysis is obviously limited by the types and quantity of data available. The less data is able to quantify a variable, the less one is able to analyze such a variable. At least it provides a more objective starting point from which non-quantifiable variables can be considered. If the starting point is wrong, the conclusion is much more likely to be wrong.

IMO this thread is best if not used to discuss individual studies (completed, ongoing or potential). However, since you reference a specific, common type of comparison for which math and data analysis are used, then let me briefly continue for illustrative purposes only.

Let's say one is comparing Gretzky and Lemieux with "simple adjusted" data (for league games, GPG and assist/goal ratio), but believes there are many other factors not being assessed. This is how I might approach such a comparison. First, let me say that even if we stop with "simple adjusted" data for the two, we very likely have a better starting point than if we used raw data. One might be tempted to stop there and use mental estimates for other factors in the interest of saving time/effort. However, the constraint is often more one of time/effort than in limits of the data. Eventually, one reaches the point of diminishing returns, where either the time/effort vs. info obtained is too much, and/or the the info provided by the data vs. the influence of non-quantifiable factors and random error is too little.

In such a comparison, there are often ways one could use other data and/or build upon the studies of others to help filter out other factors and refine the comparison.

- League quality: A study of league quality and/or the difficulty could assist in this case (it seems we, and others, have studied such things). Specifically, Lemieux probably played more of his prime years in a league of higher quality (although due to his injuries, the differences aren't as drastic), while if using "simple adjusted" data, Gretzky played more during a time when such data is biased against him to a some degree (probably due in large part to factors such as scoring being more balanced between lines).

- Competition: The possibility and likely causes of differing competition should be assessed and taken into account somehow if possible. Specifically, Gretzky's final years and much of (what should have been) Lemieux's late prime were impacted by a large group of forwards from outside Canada, which differs from Gretzky's prime years when such impact was relatively minor. The simplest, yet admittedly imperfect way I have found to look at this impact in isolation is the thought experiment "what if there were no (or player X, the one being studied, was the only) non-Canadian (or non-North American) player(s) in the NHL? How would this have affected player X's rankings in various categories. Again, the point is to have a much fairer and better starting point, not be perfect, since the other choice is much less fair and therefore less useful in considering the impact of this other factor. Without looking at the data, Lemieux should be impacted more by his additional competiton, but since both players often were leaders in various categories anyways, I'm not sure if there's much/any difference. This may have been different if Lemieux was playing more from '98-01.

- Teammates/Linemates: One can look at how each player performed with various linemates and/or teammates, and how he performed without each/all of them. One can also look at how some/all of those players performed without the player being studied. Specifically, I haven't really looked at this effect in depth.

- Team Playing Style: One can look at team performance in such categories as ESGA as a general, imperfect indicator of whether a team was more open or restrictive in playing style. Of course this metric depends on the quality of the defense/goalie, etc., so it's far from perfect, but at least it may give us some important info. Specifically, Gretzky tended to play on much better overall teams than Lemieux, and I think his teams tended to have lower ESGA (but not certain without looking at data).

- Overall Impact: One can look at adjusted plus-minus to see how much better each player's team was with and without him on the ice at even strength. Specifically, Gretzky's adj. +/- is better than Lemieux's, but this becomes complicated by the fact that they had unusual comparisons (Jagr, one of the leaders in this metric, and Messier, who often performed poorly in this metric in large part due to having Gretzky as part of his "off ice"). Also, although of limited value, the win% of the team with and without the player can be examined (if calculated properly). Specifically, Lemieux's teams performed much, much worse without him than with him. This is complicated by the fact that he missed the majority of some seasons (which makes the "with" component much less reliable). Gretzky didn't miss enough games during his prime to have even a decent sample of games from which to assess his overall impact.

- Playoff/International: The importance of this is often overemphasized in proportion to regular season performance. I'm not making a judgement, necessarily, but simply saying that by many/most people it's given much more importance in proportion to number of games played. There are some factors that often seem to be mostly or completely neglected when using this metric. First, while most at least know about adjusted numbers, they often cite actual playoff data which is unadjusted and therefore difficult to compare across different periods. Second, if career numbers are used, the proportion of playoff games played during a player's peak/prime vs. career can vary dramatically. Third, while there are smaller differences in strength of schedule during the season, the differences in opponents are much larger. A player on a very strong team will still generally face teams which are worse than his, but on avg. the playoff opponents will be better than the team's regular season opponents. However, a player on a weaker (in playoff terms) or mediocre team (in reg. season terms) will almost always be facing superior opposition and so his performance can generally be expected to be significantly worse than a player on a significantly better team. For instance, Dionne and Kariya are often criticized for their playoff performances, yet they are likely the underdog in most cases, so their playoff performance would be generally be expected to be worse. They also generally can less afford to rest during the regular season, lest their team miss the playoffs. Specifically, without looking at the data more in-depth at present, but using my own adjusted playoff data, I think Gretzky performed slightly better on a prime/career basis, but given the generally better teams he played for, it's difficult to distinguish between them.

- Trophies & Voting: This is another factor often cited by people when evaluating players. What most don't acknowledge is that it is simply quantifying the opinions of alleged "experts." Just how much importance should be given to the opinions of others, given that their choices are often difficult to explain and their credentials may vary substantially? The source data is simply the opinions of a select group (most often sportswriters). Quantifying this does not change this fact. While some interesting work has been done in this field (such as HockeyOutsider's Hart & Norris shares, which place emphasis on how often and/or how close to a player was at/near the top, rather than simply Trophy counting), the source data is completely subjective and this needs to be remembered.

In summary, while there are usually limits on the availability of and information provided by various data, we often assume that data cannot be used to evaluate various, seemingly non-quantifiable factors, rather than find a way to properly use what data is available to shed further light on the factor being considered. The goal IMO isn't to create some grand unified theory of hockey, but to attempt to provide a more objective starting point for further subjective discussion. It's for the individual to decide whether the importance of quantifying various categories of performance and/or contextual factors is worth the time and effort involved, but we must be careful in declaring such factors as completely unquantifiable and otherwise resorting to completely subjective means of analyzing performance and providing context.

Edit: Sorry if this thread is meant as mainly a link thread. Anyway, I made a few suggestions for further studies, rather than link to existing ones.

I may have missed your suggestions for further studies. Perhaps you meant that others could expand on your studies. Again, I encourage you to provide links (in a single post) to at least some of what you believe are your most important, most interesting, and most easily understood studies, and/or those which you feel could benefit in some way from some form of feedback (and indicate which studies could use what feedback in your single post). If you don't wish to share links to your studies or wish to wait until a point when you have more time to discuss your studies, I certainly understand. The only study I have posted is probably incomplete, may be of interest to only a small group of people, and is likely completely understood by even less. However, I felt it was important enough to share, in the hopes that some others may find it interesting, important, useful, and/or may even improve upon it in the future.
 

Bear of Bad News

Your Third or Fourth Favorite HFBoards Admin
Sep 27, 2005
13,607
27,436
Some of the goalie metrics that I'd published here a few years back (this link is to the 2008-09, and that thread has links back further):

http://hfboards.mandatory.com/showthread.php?t=634696

This will all be on the goaltender site by the end of the summer (it's in the database, now I just need to get the database on the site - more tedious than it may sound).
 

Czech Your Math

I am lizard king
Jan 25, 2006
5,169
303
bohemia
Some of the goalie metrics that I'd published here a few years back (this link is to the 2008-09, and that thread has links back further):

http://hfboards.mandatory.com/showthread.php?t=634696

This will all be on the goaltender site by the end of the summer (it's in the database, now I just need to get the database on the site - more tedious than it may sound).

Thanks for posting your study, I found it interesting.
 
Last edited:

Yurog

Registered User
Jan 10, 2012
143
8
Magnitogorsk
I am ready to assist you in mathematical analysis to compare the merits of the players. Сurrently I'm developing goalies ranking, based on statistics and GM voting. Сoefficients in the my formula are adjusted to final vote. The resulting trend determines the priority of the changing parameters in the NHL. I'm ready to show the rankings since 97/98 in a month.
 

Hockey Outsider

Registered User
Jan 16, 2005
9,197
14,635
C1958: I think Geocities no longer exists. In late August I'll upload all of the PDF files to another site.

I'll take a look at your Henri Richard post then as well. I will say that I have a much more favourable opinion of him now than I did several years ago (when the argument in his favour was essentially "he won a lot of Stanley Cups"). His defensive abilities are much better documented now, and his offense (once his relatively low PP ice time is taken into account) is impressive for the era.

Yurog: correct me if I've misunderstood, but are you trying to predict the results of Vezina voting based on goalie stats? That sounds very interesting; look forward to seeing the results. I once tried to do the same with Norris trophy voting but had no luck (presumably because defense is not really captured by any mainstream statistic, but obviously influences Norris voting).
 

Yurog

Registered User
Jan 10, 2012
143
8
Magnitogorsk
Function Solver in MS Excel helped me.100 is the final result for the fit
For example.


Name Ranking
  1. Lundquist 100,0000002
  2. Smith 93,20732182
  3. Rinne 93,20732082
  4. Quick 93,20731983
  5. Elliott 93,0132771
  6. Kiprusoff 87,1399342
  7. Lehtonen 83,54675502
  8. Howard 82,95263135
  9. Halak 81,02432634
  10. Fleury 79,64233731
  11. Miller 77,83387228
  12. Thomas 77,53050029
  13. Luongo 75,68305281
  14. Price 75,31630927
  15. Niemi 75,20865933
  16. Bryzgalov 74,54291087
  17. Brodeur 74,1007518
  18. Ward 73,82035032
  19. Hiller 73,05241501
  20. Schneider 70,41955845
  21. Anderson 68,71121458
  22. Varlamov 68,3971819
  23. Pavelec 66,88403608
  24. Hedberg 66,05515337
  25. Theodore 65,6277608
  26. Vokoun 65,44629678
  27. Backstrom 65,36998171
  28. Crawford 63,00375401
  29. Dubnyk 60,92955098
  30. Rask 59,60374558
  31. Nabokov 59,33459414
  32. Giguere 59,04597066
  33. Garon 55,59857191
  34. Biron 54,44583961
  35. Gustavsson 53,18299129
  36. Macdonald 53,07904081
  37. Harding 52,31288714
  38. Khabibullin 52,18349146
  39. Clemmensen 49,11302356
  40. Greiss 48,7956581
  41. Neuvirth 47,86726204
  42. Reimer 47,86516663
  43. Sanford 47,49528753
  44. Budaj 47,13471049
  45. Bernier 46,03547027
  46. Smason 44,88473201
  47. Emery 44,71239916
  48. Enroth 43,43123302
  49. Bobrovsky 42,3137993
  50. Cmason 41,63188183
  51. Bachman 41,11706778
  52. Lindback 40,38075383
  53. Montoya 40,05589419
  54. Labarbera 39,54285527
  55. Roloson 38,9893942
  56. Hackett 36,85578059
  57. Scrivens 35,35920027
  58. Conklin 33,06401829
  59. Johnson 32,62635596
  60. Auld 25,54719155
 
Last edited:

Czech Your Math

I am lizard king
Jan 25, 2006
5,169
303
bohemia
I am ready to assist you in mathematical analysis to compare the merits of the players. Сurrently I'm developing goalies ranking, based on statistics and GM voting. Сoefficients in the my formula are adjusted to final vote. The resulting trend determines the priority of the changing parameters in the NHL. I'm ready to show the rankings since 97/98 in a month.

Your interest and work are appreciated.

Please start a separate thread for your study whenever you wish to present the results and methods. You can then link to that thread, in this thread.

It seems better to keep this thread for posting links only, so that readers will not have to scroll through endless pages looking for actual studies.

Thank you!
 

Czech Your Math

I am lizard king
Jan 25, 2006
5,169
303
bohemia
If you previously started a thread for a study you did and wish it moved to the new "By the Numbers" forum, please PM me (or one of the other moderators of that forum) to request such. Thank you!
 
Last edited:

tarheelhockey

Offside Review Specialist
Feb 12, 2010
85,438
139,473
Bojangles Parking Lot
I just want to thank whoever was behind the "By The Numbers" forum.

It's easy to see how, between the History and Numbers forums, HFBoards could turn out to be a major voice in the historiography of the sport.
 

Czech Your Math

I am lizard king
Jan 25, 2006
5,169
303
bohemia
I just want to thank whoever was behind the "By The Numbers" forum.

It's easy to see how, between the History and Numbers forums, HFBoards could turn out to be a major voice in the historiography of the sport.

I'm guessing it is Taco MacArthur and 70sLord who deserve our thanks. Thanks guys!
 

Bear of Bad News

Your Third or Fourth Favorite HFBoards Admin
Sep 27, 2005
13,607
27,436
I'm just glad that there's now a place for people to potentially collaborate on these sorts of things. :handclap:
 

Hockey History

Registered User
Oct 6, 2012
5
0
hockey-history.com
Mathematical analysis of Sport Data

If you want to go deeper try this (it is really about mathematical analysis, i.e. Spatial Poisson distribution of probability, etc.) probability and odds computing

This will help you to understand how betting companies make their odds and it can give you the way how to compute probability to each match.

Good luck! :handclap: :handclap: :handclap:
 

Ad

Upcoming events

Ad

Ad