Some of the gents over at Canucks Army have been doing an excellent job investigating trends in drafting and success. This analysis includes excellent articles on size having an impact on draft success looking at forwards (by Money Puck) and defensemen (by Josh Weissbock).

I’ve gotten my hands on some data, so I thought I throw my own analysis into the pile.

Let’s take a glance at what some numbers say.

Josh Weissbock and Money Puck kindly donated the same sample as used in the forwards article. The data set was every CHL forward to play since the 1987-88 season.

First, I looked at age, height, and CHL points per game in predicting both NHL games played or NHL points per game – using simple multivariable linear regression models.

Both predictions turned out similar, although the correlation in predicted vs observed was slightly higher in NHL points per game.

The statistics can be seen here for the slightly higher R-squared NHL PPG prediction model:

There is a lot here, but we will keep things simple. All three independent variables (height, age, and PPG) play a statistically significant role in the model. Using a simple linear multivariable regression of nothing more than height, age, and CHL points per game, we can accurately explain about 23 percent of the variation experienced in these players NHL points per game.

That’s pretty good given how little information those three independent variables provide relative to what is reasonably possible. Variables like shooting percentage regressed scoring, team quality adjustments, advance statistics, draft combine data, qualitative information, and other items could easily improve the model.

Like in Money Pucks’ and Josh Weissbock’s articles, we found height to be a statistically significant variable in predicting future NHL success.

I was curious about one other thing though, so I took another step.

I selectively altered the sample to remove every player who did not play in the NHL. This meant that even players who only amounted to one NHL regular game were still included into the sample.

Here again are the results for the prediction model for NHL PPG:

At a cursory glance, not much has changed. The model does however explain the variance experienced in NHL points per game by about 3 percentage points better.

There is one significant change though; the independent variable height changes from a p-value of 0.000 to 0.6204. Non-stat geeks may be asking what is a p-value and why does that change matter… Instead of answering myself, here is what my stats program’s auto-analysis says:

I took the program’s advice and took out height as an independent variable for predicting NHL PPG for all CHL players to play one or more game. I ended up with a R-squared that only differed in the 100th of a percent *(ie: in the 00._ range)*.

What does all this mean?

It means that a player’s height in the CHL significantly matters in terms of predicting if a player makes the NHL, but doesn’t seem to matter when you look at players with a single NHL game played *(or more)*.

The above results could form for other reasons, but it does suggest that there may be some unfair height bias in decision making.

This brings up an important issue when dealing when building predictive models: the success or failure of individuals one is trying to predict may be unwarranted.

With draft theory, any biases that may cause the draft to be inefficient likely also exists in the professional level of evaluation that impacts whether a player makes it. If amateur scouts worry about size overly too much, then coaches, GMs, and pro-level scouts likely do too. If amateur scouts over or under emphasize a certain skill or play-style, then the same goes for the upper levels.

Even draft position may skew results in an inefficient manner. Higher drafted players may receive superior attention or be given additional chances in becoming a bonafide NHL player.

The persistence in possible bias increases difficulty in actually proving such bias exists in the first place.

We know that historically teams are rational, success seeking entities and have generally been good at detecting and drafting talent. However, there are a lot of indications that some values may need adjusting. While size is an attribute that helps a player perform, a small player who out performs a larger player is still out performing that larger player.

StewsquaredMy cat’s breath smells like cat food.

Garret HohlBrushing your teeth is an important part of cat hygiene.

StewsquaredSo you’re saying that there is a direct correlation between me brushing my teeth and my cat’s hygiene? I’d like to see a bar graph that supports such claims.

Garret HohlEdit: DOUBLE POST

BaumermanInteresting work. Perhaps you could look at the quality of teams (NHL) during that sweet-spot of 20-24 years old for a prospect to make the jump to the NHL. Going along with your line of thinking, a more desperate team (one near the bottom of the standings) would be less likely to have a height biased because of their unfortunate circumstances. Conversely, better teams might only give taller players NHL opportunities because of their positive circumstances.

But more importantly, I am one of the three people who listens to your podcast. Where is it this week?

Garret HohlI have a bad cold which has kept me at home (which is why I have published about 3 articles a day between JN and HG) and Rhys doesn’t want to catch it.

Probably recording tomorrow for Thursday publish.

Dirty30Regression analysis is typically a very robust model when used with a random sample. The one ‘error’ is that this is not a sample but a population. If you had simply chosen x-number of players from a specific population and ran your equation you could have compared the results to the population. Or split the population and run it against one group.

I’d also like to know how you coded your variables as the less variance in the IV’s typically the less variance explained in the DV.

It might be that a better way to examine discrete data of this kind is with anova.

Actually, if you have multiple measures of success — eg, draft position, years to entry (how long to get to the show), years played, points and contract increases, then you could also consider running a manova to see what independent variables explain how much of the variance in the multiple measures of success.

Overall, I think height typically confers some strength and speed advantages that allow for quicker puck possession and greater likelihood of retaining possession.

And we can see that it does confer a greater likelihood of scoring if you have the puck than trying to gain possession of it.

But I find this an interesting challenge and hope you find my comments as encouragement to look at different variables and methods.

Garret HohlYa, as I mentioned, this is just a cursory look at something and a mention about how there is an issue with the measure of success possibly being clouded by bias inherent in hockey management.

Dirty30Kudos for taking it on!

StewsquaredWhat stats program are you using?

Garret HohlThis was using statgraphics since it was home computer.