Assessing the Model's Accuracy

Probability models are difficult to assess by their nature. Linear models offer an R-squared that give a definitive assessment of the explanatory/predicitive power of a model. But probility models, such as the logit model I use, offer numerous indirect assessments but none is more straightforward the the % correct score. It tells us how well our model predicts actual outcomes.

If the model predicts outcomes well, then we know two things. First, we know how to predict games, which is fun. Second, we understand what is really important in winning NFL games and we have a deeper understanding of the inner-workings of the sport as it is played.

But % correct doesn't tell the whole story. It simply draws a line at .50 and if a team that is predicted to have a .51 win probability actually wins, we consider the model correct. If a team predicted to have a .49 win probability wins, the model is considered incorrect. But that's unfair. We expect to be wrong in 49% of all such cases. That's just the reality of equally matched teams. Further, the model is expected to be wrong in 20% of the games where it predicted the favorite team to have a .80 win probability. In fact, if the model were exactly 20% wrong in such games, it would mean the model is better than if it were 100% correct.

So to assess the model's accuracy, not just in terms of how often its predicted favorite actually won, but in terms of how accurate the predicted probabilities were, I produced the table below. I divided all the games into 5 categories based on the "lopsidedness" of the probability. Where the visiting team was forecast to have a win probability of between .00 and .20 (and the home team's win probability was between 1.00 and .80), I scored the models accuracy. I did the same for win probabilities between .21 and .40, between .41 and .60, between .61 and .80, and finally between .81 and 1.00.

If the model fits relatively well, its % correct scores should reflect the predicted probabilities of each category. So, for the 1st category, between .00 and .20, we'd expect a % correct score of approximately 90%. Here is how all 5 categories scored:


The model seems to fit well accross the spectrum of games. Locks, solid favorites, and toss-ups were accurately predicted by the model.

  • Spread The Love
  • Digg This Post
  • Tweet This Post
  • Stumble This Post
  • Submit This Post To Delicious
  • Submit This Post To Reddit
  • Submit This Post To Mixx

6 Responses to “Assessing the Model's Accuracy”

  1. Derek says:

    I've been working on predicting football games using various models to predict the final score margin (home team score - away team score). Linear regression, support vector machines, neural networks, the works.

    Looking at those and other people's models, including the Las Vegas spread, each model has an overly large bias towards home-field advantage. While about 58% of games are won by the home team, the models will regularly classify 65%-75% of games as home team wins within a season. This was especially problematic in 2006, when only 53.13% of games were won by the home team.

    So I was wondering what proportion of games are classified as home team wins by your system. Also, what were your train and test set(s) for this article?

  2. Brian Burke says:

    Derek, very impressive site. I'll have to set aside some time to really go over it.

    2006 was a problem for home teams. There was an unusually large number of upsets throughout the season.

    In 2005 the home teams fared much better. My model, at the time, was based only on 2005 stats and game results, which I presumed would be enough to get valid estimators, but I was wrong.

    It's a non-linear model, so quantifying home field advantage is complicated. The "coefficient" for home field is actually a logarithm of the odds ratio to win.

    If I make BAL play BAL, the home Ravens are favored 0.63 to 0.37, quite a large advantage.

    By week 10 I realized 2005 was out of whack with 2006, so I fudged the home field advantage dummy variable from [1 or 0] to [0.85 or 0]. The home team was then favored 0.59 to 0.41, more in line with the historical average of 58%.

    Since the end of the season I expanded the database to include all regular season games since the 2002 expansion, including 2006. The new estimator for HFA is probably clos to the 0.58 range, but I'd need to do some playing around in excel to get a definitive answer for you.

  3. Brian Burke says:

    With the larger data set ('02-'06 seasons), the HFA for equally matched teams is 0.60 -- 0.40.

  4. Derek says:

    Thank you for the compliment. I appreciate it, and I'm really impressed by what you've been able to put together. When this was an independent study, the two professors I worked with were Hungarian and Indian... women. So the explanations of inputs had to be really dense. Might want to skim through some of that.

    Anyway, since MATLAB makes it easy, I threw together some logistic regression tests.

    With some reworked inputs (yards per rush, yards per pass) and some added inputs (3rd down conversion, penalties, kick returns), I'm getting 62.07% accuracy on average from 1997-2006. 68.3% in 2005 (training on 1996-2004). 57.14% in 2006 (training on 1996-2005). I checked what happens if I use an all-zero input (i.e. teams are evenly matched). As I use more training data, the P(home team wins) declines, reaching 60.29% if I train on 1996-2006.

    What I'm starting to realize, though, is that a strong emphasis is being placed on home team stats in terms of regression and correlation coefficients. As a quick example, home rush YPG has a .15121 correlation with final score margin, but away rush YPG's coefficient is -.11612. When I go to home off. vs away def. and away off. vs home def. stats, the HOvAD stats have the higher magnitude coefficients.

    I plan on expanding on this in my blog, but I'm starting to wonder if stats need to be adjusted for home field advantage. A dome team with a great passing game like the 1999 Rams might have their numbers slightly inflated because they play 8 games on the faster Astroturf. Do I thus adjust for that when they go on the road?

  5. Brian Burke says:

    I see what you're saying about HO vs. AD, etc. I remember noticing a while back that when the home team wins, the average score is 20-13 (or something close). But when the visitor wins, the average score is 20-17.

    The winning score stayed the same, only the score of the loser changed. I'm not sure how to interpret that, but it probably has implications for spread prediction models like what you're doing.

    One suggestion to tackle that phenominon is to have 2 completely different models, one for when the home team is favored, and one for when the visitor is favored. Just a thought.

    I suspect the "on turf" and "on Monday night" and "in the rain" stats are problematic because the sample sizes are too small. But if you want to model it, I'd recommend using dummy interaction variables. OnTurf = {1,0} Home = {1,0} and Home x OnTurf = {1,0} throw all 3 variables into the mix and see if the Rams advantage is from playing at home or if it exists on the road & on turf as well. You can do the same thing for domes, weather, etc.

    One other note about HFA--after following the '06 season so closely, I noticed that HFA was strongest when bad teams played other bad teams (OAK, SF, ARI, DET, CLE, etc). My theory is that the bad home teams seize on the rare opportunity to notch a W.

  6. Derek says:

    I was thinking about what might coincide with the variance in home winning percentage, and the one thing that occurred to me was the scheduling. Specifically interconference games since each team's interconference schedule is completely different the next year. After taking a nauseatingly long look at it, I confirmed it.
    For the full spiel, click here.


    It occurred to me when you mentioned bad teams playing other bad teams, so thank you.

Leave a Reply

Note: Only a member of this blog may post a comment.