February 10 2015 01:00PM
Some of the gents over at Canucks Army have been doing an excellent job investigating trends in drafting and success. This analysis includes excellent articles on size having an impact on draft success looking at forwards (by Money Puck) and defensemen (by Josh Weissbock).
I've gotten my hands on some data, so I thought I throw my own analysis into the pile.
Let's take a glance at what some numbers say.
Josh Weissbock and Money Puck kindly donated the same sample as used in the forwards article. The data set was every CHL forward to play since the 1987-88 season.
First, I looked at age, height, and CHL points per game in predicting both NHL games played or NHL points per game - using simple multivariable linear regression models.
Both predictions turned out similar, although the correlation in predicted vs observed was slightly higher in NHL points per game.
The statistics can be seen here for the slightly higher R-squared NHL PPG prediction model:
There is a lot here, but we will keep things simple. All three independent variables (height, age, and PPG) play a statistically significant role in the model. Using a simple linear multivariable regression of nothing more than height, age, and CHL points per game, we can accurately explain about 23 percent of the variation experienced in these players NHL points per game.
That's pretty good given how little information those three independent variables provide relative to what is reasonably possible. Variables like shooting percentage regressed scoring, team quality adjustments, advance statistics, draft combine data, qualitative information, and other items could easily improve the model.
Like in Money Pucks' and Josh Weissbock's articles, we found height to be a statistically significant variable in predicting future NHL success.
I was curious about one other thing though, so I took another step.
I selectively altered the sample to remove every player who did not play in the NHL. This meant that even players who only amounted to one NHL regular game were still included into the sample.
Here again are the results for the prediction model for NHL PPG:
At a cursory glance, not much has changed. The model does however explain the variance experienced in NHL points per game by about 3 percentage points better.
There is one significant change though; the independent variable height changes from a p-value of 0.000 to 0.6204. Non-stat geeks may be asking what is a p-value and why does that change matter... Instead of answering myself, here is what my stats program's auto-analysis says:
I took the program's advice and took out height as an independent variable for predicting NHL PPG for all CHL players to play one or more game. I ended up with a R-squared that only differed in the 100th of a percent (ie: in the 00._ range).
What does all this mean?
It means that a player's height in the CHL significantly matters in terms of predicting if a player makes the NHL, but doesn't seem to matter when you look at players with a single NHL game played (or more).
The above results could form for other reasons, but it does suggest that there may be some unfair height bias in decision making.
This brings up an important issue when dealing when building predictive models: the success or failure of individuals one is trying to predict may be unwarranted.
With draft theory, any biases that may cause the draft to be inefficient likely also exists in the professional level of evaluation that impacts whether a player makes it. If amateur scouts worry about size overly too much, then coaches, GMs, and pro-level scouts likely do too. If amateur scouts over or under emphasize a certain skill or play-style, then the same goes for the upper levels.
Even draft position may skew results in an inefficient manner. Higher drafted players may receive superior attention or be given additional chances in becoming a bonafide NHL player.
The persistence in possible bias increases difficulty in actually proving such bias exists in the first place.
We know that historically teams are rational, success seeking entities and have generally been good at detecting and drafting talent. However, there are a lot of indications that some values may need adjusting. While size is an attribute that helps a player perform, a small player who out performs a larger player is still out performing that larger player.