In the interest of full disclosure, or for those fellow uber-geeks, here is the actual model I'll be using for estimating outcome probabilities for each NFL game.
It is a logit regression model based on the outcomes of all regular season games in 2002-2006. I looked at each game twice, once from the point of view of the home team and from the point of view of the visiting team. I called each team, Team A and Team B. To identify the home team, I used a dummy variable AHome, which was 1 or 0 depending on whether Team A was home or away. The dependent variable is AWon, which is 1 if Team A won or 0 if Team B won. There were 2560 cases (games) considered.
VARIABLE | COEFFICIENT | STD ERROR | T-STAT | SLOPE at mean |
const | -0.26 | 1.36 | -0.19 | |
AHOME | 0.74 | 0.09 | 8.29 | 0.19 |
AOPASS | 0.45 | 0.07 | 6.56 | 0.11 |
AORUN | 0.27 | 0.10 | 2.65 | 0.07 |
ADPASS | -0.54 | 0.09 | -5.90 | -0.13 |
ADRUN | -0.21 | 0.11 | -1.87 | -0.05 |
AOINTRATE | -15.90 | 6.26 | -2.54 | -3.98 |
ADINTRATE | 17.68 | 5.16 | 3.43 | 4.42 |
AOFUMRATE | -20.50 | 7.79 | -2.63 | -5.12 |
APENRATE | -1.49 | 0.72 | -2.07 | -0.37 |
BOPASS | -0.45 | 0.07 | -6.54 | -0.11 |
BORUN | -0.27 | 0.10 | -2.64 | -0.07 |
BDPASS | 0.53 | 0.09 | 5.83 | 0.13 |
BDRUN | 0.20 | 0.11 | 1.79 | 0.05 |
BOINTRATE | 15.71 | 6.26 | 2.51 | 3.93 |
BDINTRATE | -18.95 | 5.16 | -3.67 | -4.74 |
BOFUMRATE | 21.01 | 7.79 | 2.70 | 5.25 |
BPENRATE | 1.47 | 0.72 | 2.04 | 0.37 |
Retrodictively, the model predicts 69.5% of the games correctly. But keep in mind there are many evenly matched games and upredictable upsets, so it may be impossible for even the most perfect model to get past 75% or so.
I realize the numbers in the table above are meaningless to most people, but I want to ensure everything I do is out in the open.
Key:
OPASS = (offensive pass yds - sack yds) / pass plays
ORUN = offensive run yds / run plays
DPASS = (defensive pass yds - sack yds) / pass plays
DRUN = defensive run yds / run plays
OINTRATE = offensive interceptions / pass attempts
DINTRATE = defensive interceptions / pass attempts
OFUMRATE = fumbles / offensive plays
PENRATE = team penalty yds / total plays
The t-stat indicates the significance of each variable. For this sample size, a t-stat of approximately 1.8 or greater (or -1.8 or less) indicates a signifcance level of p=0.05 or better.
Below is a graph of the spectrum of game probabilities divided ramdomly into two sets, training cases for the regression, and test validation cases. The graph is of the actual outcome rates vs. the model's predicted probabilities.
By 69.5% accuracy retrospectively, do you mean that you're testing on games that the model was trained on?
By which I mean, did you test the 2002-2006 games on using the logistic regrssion model produced by training on the same set of games?
Derek-Yes.
The word is "retrodictive", not "retrospective," incidentally.
And the more interesting number in that case isn't the prediction accuracy - that's determined by the set of games used for the test - but the expected accuracy versus the observed accuracy (i.e. the error).
The observed accuracy is mostly irrelevant without knowing the expected accuracy: if one model expected 60% accuracy, observed 70% accuracy, and another model expected 65% accuracy and observed 65%, the second is likely a better model, as the first, in all likelihood, just got lucky.
Pat-Thanks.
Retrodictive--I couldn't remember that word.
Help me out. You're saying my observed accuracy is 69.5%. But how does one know what an expected accuracy would be? Last year, this model (or one very close to it) was correct 65% of the time, and was well calibrated, i.e. 80% winners won 80% of the time, etc. But 2006 was a very odd year in which home teams only won 53% of games when they normally win 58%. I would expect it fall somewhere between 65 and 70% correct for future games. Anyway, how would I calculate error?
Patrick-Are you refering to "calibration?" I think we just have some differenct terminology. Here's how last year's calibration numbers looked.
http://www.bbnflstats.com/2007/03/assessing-models-accuracy.html
But how does one know what an expected accuracy would be?
Run through the season. Predict each pair of games. The regression will give you some number that is related, somehow, to the probability that team A will beat team B (and obviously the probability that B will beat team A). You obviously know that conversion from the calibration. Average the larger of the two numbers for all games (the larger represents the winner).
Then, subtract that number from your observed accuracy. That's the error.
Now, interpreting that number is a bit of work: see a post in my Eagles blog here, although I think you have to register, so sorry about that.
But the basic idea is simple: suppose you have 4 games, and you expected to get 70% of them right, and you got 3/4 of them right. The error in that case would be 5% - but the problem is that the uncertainty on that error is huge, due to the low statistics. If the same games had been done 25 times more often, 3/4 is perfectly consistent with 90/100, so in truth, your "error" is really 5+/-40% or so.
There's also one other thing which is important for comparing ranking systems which is often overlooked: the convergence speed. In your case, the "team ranking" is based on real statistics, so the question there is "how fast does the combination of those statistics stabilize"?
That number essentially tells you what the uncertainty in your prediction for each game is - that is, if you say "team A is going to beat team B 70% of the time", how precise is that 70%? You have an estimate for how precise that 70% is from the errors in the regression, but that's just uncertainties in your model - you also have to accept that there are uncertainties in the data, too.
Patrick-Thanks. All great stuff I was not familiar with. I used some different software that can randomly select cases as training cases and validation cases. I posted the results in graph format to the original post.
My interpretation is that you'd want to see two things. One, the test and validation plots are tightly intertwined. And two, they both follow the diagonal path tightly so as not to diverge to far from actual vs. expected.
I don't have an exact error number yet. That will take a bit of work in Excel. But my reaction to the graph is that there is not much divergence between expected and actual.
The thing to then watch for, year to year, are years where the error is significantly larger. You can figure out what the "expected error distribution" is by assuming the error truly is binomial (which is what you're presuming in the regression anyway, since it's a chi-squared fit), and doing a Monte Carlo.
If the error is always within the expected error distribution, then you've got a model which almost perfectly represents the game. It almost certainly won't be, since, well, it's a model, and the game is more complex.
Help me out here as it has been a long time since I took econometrics.
The coefficient for AHome is 0.74. Keeping all other variables constant, doesn't this mean that making team A the home team causes their win probability to increase by 74% as opposed to if they are the away team? This does not make sense.
I'm sure I'm missing something so please help me understand.
Great site by the way.
Brian
Hi, Brian. Thanks. You'd be correct if this were a linear regression, but it's a logit regression. The 0.74 represents the natural log of the change in the odds ratio of the home team winning. It works out to a 57.5% likelihood that the home team wins. Non-linear logit models are ideal for dichotomous outcomes, such as win/lose.
Brian,
I'm working to replicate your work here so that
a) I can get updated coefficients with defensive interception rates removed.
b) I understand the process end-to-end and can apply it to other sports.
So far I'm not having luck getting values close to what you have or prediction percentages close to your level. Can you possibly point out where I'm going wrong?
I originally tried to calculate the necessary data points for a given game using only that game's data (ie. aopass would be using only the passing data from that specific game). This led me to problems when I tried to run a regression against that data as it cause data errors due to the fact that some variables were identical (ie. aopass = bdpass)
I then tried to calculate the necessary data points for a given game using the average of the last 4 games for a given team (is. aopass used the passing data from the past 4 games for team a) This allowed my to run my regression but the outputted coefficients were fairly different than yours. Most disconcerting was the fact that aointrate and aofumrate were positive values.
Is there something wrong with my current approach? Am I butchering the process?
CE (and others who have asked) -- I promise to publish updated coefficients and a sample calculation. I've been out of the country for 3 of the past 5 weeks (Karachi is so nice this time of year!) so when I get home I'll have some time to answer the mail.
Hi, your system seems great, but have you publish an updated coefficient and a sample calculation? That would be really helpful since I try to do the same system! Thanks!
Quick question Brian.
So you built your logit model using regular season 2002-2006. So say its the beginning of the 2007 season and you want to start predicting some outcomes. How many games into the 07 season before you have enough data to put into your regression equation? Assuming that the team has changed from the end of the 06 season to the beginning of the 07 season you don't have reliable numbers for the 1st game of the 07 season, correct? And then is just the first game of 07 good enough to predict game 2 of 07, or do you need to wait until X games have been played in this season?
Hello,
I am slightly confused as to how you avoided collinearity in the model. I'm pretty sure I may have your methodology slightly wrong, but you commented as seeing yourself as either team, so wouldn't AOPASS be perfectly correlated with BDPASS for example? Anyway, I know this post is very old so I hope you get it - was just wanting to clear this up.
Seconded. AOPASS and BDPASS should be perfectly coordinated. Including both terms should only be necessary if one of them was a dummy variable, right?
Any thoughts .. on why these particular variables were used in the final logistic regression model ? Did you create a large number of predictors, and then pick the best ones based on some measure like T-test or some other variable selection method ? How does logistic regression compare to other classifiers like CART, Random Forest, etc. ? How you handle variables that are strongly correlated ? Thanks !