The 'Fouts' Analysis

If you look at the earlier post that lists various offensive and defensive stats and their correlation with season wins, you see that there are some that correlate with winning better than the stats I've used in my model. Points scored and points allowed, in particular, correlate very well with wins. Shouldn't those go in the model instead?

Dan Fouts, former quarterback and Monday Night Football analyst, can explain this better than I can. Or actually, Will Farrel playing Dan Fouts on Saturday Night Live can. "Al, my prediction is that the team that scores more points than the other team will probably be the winner tonight. Back to you, Al."

Of course, a team that scores a lot of points and allows fewer points will win often. There is no mystery there. And lots of guys who predict NFL games or try to beat the point spread use such stats, or things like "red zone points" in various models. In fact, these kind of models probably predict game outcomes well, but would be completely invalid if you really wanted to learn anything new about how the game really works.

Models that use points scored or points allowed, or variations of either, are no more analytical than Dan Fouts. We already know that the ability to score more points than another team leads to winning. Thanks, Dan. The question is: what enables some teams to score more than others?

Another type of model, one that uses the laundry list of factors that correlate with winning, faces problems of interdependence of variables. In regression models, there is one dependent (outcome) variable and several independent variables. The independent variables cannot be interdependent with the outcome and cannot be interdependent with each other. For example, if a model includes a variable that measures passing effectiveness and a variable that measures red zone effectiveness, that would be invalid. General passing effectiveness and the ability to score in the red zone are deeply interrelated, and the regression model would be unable to assign valid weights to the coefficients of passing effectiveness and red zone effectiveness, no matter how you measure either. The model might be predictive, but you wouldn't learn a thing about what really leads to winning.

There are other requirements for a linear regression model's validity, such as normal distributions of the independent variables and random distributions of the errors between a variable's linear estimation and the actual values. I would guess that 99% of the prediction models out here on the interent don't even bother with worrying about these things.

  • Spread The Love
  • Digg This Post
  • Tweet This Post
  • Stumble This Post
  • Submit This Post To Delicious
  • Submit This Post To Reddit
  • Submit This Post To Mixx

1 Responses to “The 'Fouts' Analysis”

  1. Anonymous says:

    Its intersting that you say that "the model might be predictive but you wouldnt learn a thing about what really leads to winning" in the second to last paragraph. Writing this in August of 2010, and seeing that your efficiency model has consistantly been one of the best "predictive" models out there; it seems to me thats the point, those stats turn out to not be predictive at all (total points scored, red zone effectiveness...etc.). It seems that in your quest of trying to learn what really leads to winning you have stumbled on a very high prediction accuracy.
    Great site, been reading it alot over the past year or so, keep it up!

Leave a Reply

Note: Only a member of this blog may post a comment.