beats-the-spread: Thanks for the great comments. I...

2008-01-19T13:49:00.000-05:00

beats-the-spread: Thanks for the great comments. I took a look at your site yesterday. Very impressive. The link behind your name didn't work, so here it is for anyone else who's interested:

http://www.numbersinsight.com/niblog/football.php

I've already tested many of the points you suggested.

-I use rate stats, which are very independent (total pass yds and total run yds aren't, but yds per pass and yds per run are independent.)

-My current model does not do what I suggested in this post, i.e. first predict some stats, then predict wins. But the numbers suggest that this method may actually reduce total error. Some stats are more random and noisy than others. For example, using stable, less-noisy stats to estimate the central tendency of a team to throw interceptions may be better than using past interception data.

-I've found that residuals for both my linear and logistic models are generally normal.

-I experimented with neural networks for game predictions, but found that it was slightly less effective than logistic regression models. I'm not an expert on NN, so there could be ways to improve the effectiveness.

-Outliers in the past were very "on-axis," i.e. they didn't bias the coefficients. But I have a feeling that when I include this year's data, NE might cause some problems. For example, if you graph their TDs per passing efficiency, they are far off the linear axis. To me it suggests once an offense becomes so efficient, they pass a point of inflection beyond which it almost can't be stopped.

I've got some similar ideas about how to build a model vs the spread.

One suggestion for you is to try rate stats instead of total stats. It's hard to tell if you do, or if you still use total yards difference between teams.

For example, use yards per pass attempt instead of total passing yards. Losing teams can rack up lots of passing yards because they're passing much more often, not because they're better at passing. But total yards might be a better fit for point spread estimation.

First let me say that we are on the same track in ...

2008-01-18T15:38:00.000-05:00

First let me say that we are on the same track in trying to improve our models. The only difference being that you try to predict win-lose outcome and I try to predict cover-not cover Vegas spread outcomes.

Here are a couple of statistical suggestions to your model that come quickly in mind:

- Observations are correlated while logistic regression assumes independence.

- You have "double error". If you predict week 9 based on stats from week 1-8 including say offensive stats, you first predict offensive stats to then predict game outcomes. Variance over variance

- Errors follow a normal distribution, I don't know if this one is true. Check on residuals, if not, it should be easy to take the log or standardize the outcomes in order to achieve normality.

- I have heard that using neural networks in classification data as yours provides much better results. Plus, none of the above assumptions need to be verified (maybe normality still holds, not sure)

- Outliers might be affecting your results. Have you tried using a robust logistic regression or downweighting outliers?

Comments on Advanced Football Analytics (formerly Advanced NFL Stats): Explanation vs. Prediction

beats-the-spread: Thanks for the great comments. I...

First let me say that we are on the same track in ...