Some of the gents over at Canucks Army have been doing an excellent job investigating trends in drafting and success. This analysis includes excellent articles on size having an impact on draft success looking at forwards (by Money Puck) and defensemen (by Josh Weissbock).
I’ve gotten my hands on some data, so I thought I throw my own analysis into the pile.
Let’s take a glance at what some numbers say.
Josh Weissbock and Money Puck kindly donated the same sample as used in the forwards article. The data set was every CHL forward to play since the 1987-88 season.
First, I looked at age, height, and CHL points per game in predicting both NHL games played or NHL points per game – using simple multivariable linear regression models.
Both predictions turned out similar, although the correlation in predicted vs observed was slightly higher in NHL points per game.
The statistics can be seen here for the slightly higher R-squared NHL PPG prediction model:
There is a lot here, but we will keep things simple. All three independent variables (height, age, and PPG) play a statistically significant role in the model. Using a simple linear multivariable regression of nothing more than height, age, and CHL points per game, we can accurately explain about 23 percent of the variation experienced in these players NHL points per game.
That’s pretty good given how little information those three independent variables provide relative to what is reasonably possible. Variables like shooting percentage regressed scoring, team quality adjustments, advance statistics, draft combine data, qualitative information, and other items could easily improve the model.
Like in Money Pucks’ and Josh Weissbock’s articles, we found height to be a statistically significant variable in predicting future NHL success.
I was curious about one other thing though, so I took another step.
I selectively altered the sample to remove every player who did not play in the NHL. This meant that even players who only amounted to one NHL regular game were still included into the sample.
Here again are the results for the prediction model for NHL PPG:
At a cursory glance, not much has changed. The model does however explain the variance experienced in NHL points per game by about 3 percentage points better.
There is one significant change though; the independent variable height changes from a p-value of 0.000 to 0.6204. Non-stat geeks may be asking what is a p-value and why does that change matter… Instead of answering myself, here is what my stats program’s auto-analysis says: