NFL Stats Overview

This post is intended to summarize the statistical methods commonly used throughout this site. The statistical model uses each team's efficiency statistics in running and passing, on both offense and defense, to predict team wins. Turnover rates and penalties are also included. Multivariate linear regression was used to estimate total season wins, and multivariate logit (non-linear logistic) regression was used to estimate individual game probabilities.

Efficiency stats are defined by yards per play. Passing efficiency is defined as the average yards gained (or lost) per passing play. I included yards lost in sacks in both offensive and defensive pass efficiency. Likewise, run efficiency is defined as yards per run. Turnovers are also defined as efficiencies. Interception efficiency is defined as interceptions per pass play. Fumbles are defined as fumbles per any play--both run and pass plays can result in fumbles.

Efficiency stats were found to be correlated with wins much better than total yardage stats. Teams with poor defenses that find themselves trailing (and likely to lose) tend to accumulate large amounts of gross passing yards, but only because of frequent attempts. Teams that are ahead (and likely to win) tend to accumulate large amounts of gross running yards. In both cases it's often the winning or losing that lead to the yardage, not the other way around. For this reason, efficiency stats are almost always a better measure of a team's proficiency than gross yardage stats.

Once I established a workable, logical, predictive, and statistically significant model of winning and losing, there were any number of interesting applications. I could now predict individual games. Summing the total probabilities of all 256 games in the season estimates the probabilities of a total number of wins for each team. Toward the end of the season, the game-by-game model can also be used to compare likelihoods of selected teams earning a playoff spot, or will capture a wildcard berth. I could also create very accurate power (or efficiency) rankings. Another application was that I could estimate how lucky each team was, and how much luck played a part in determining game and season outcomes.

I'm not just trying to predict games. I'm trying to understand the game itself. For example, which is more important--running or passing? Offense or defense? Interceptions or fumbles? Do special teams matter? But answering those questions accurately depends on how accurate the model is, which can only be judged by how well it predicts wins. We each may have intuitive answers to these questions, but statistics is one of the best ways to test our hunches. We can even quantify them.

2006 was the first season in which I did all of this number crunching. I needed some minimum amount of data to use for the efficiency stats, so I started predictions after week 4. As the season went on, I had a larger data set. Ulitmately, here is how my predictions fared:

At first glance, 63% correct may not appear too impressive, especially for all that work I did. But almost 2 out of 3 games correct isn't too bad, especially when compared to the national consensus. Each week, I recorded not only my own results, but those of the Las Vegas odds makers as well. (Although my purposes here are not related to gambling, I'll use the betting lines as a benchmark.) The Vegas line was only accurate in picking winners 58.2% of the time for the entire '06 season. Vegas' record was slightly worse from week 5 onward, at 57.1%.


By end of the 2006 regular season, my model was correct significantly more often than the Vegas line, by 63% to 57%. On the surface, it may not sound as impressive as it really is. But think of game prediction this way: A monkey will guess winners correctly 50% of the time. So the real question is: how much better than 50% is the model? In this case, the efficiency model added almost twice the predictive power as the Vegas line (the consensus favorite).

The 2007 season was more predictable. The model was 70.8% correct compared to just 66.7% for the Vegas consensus. Over the past two years it has been the most accurate prediction system published.

Also consider that no model could be 100% correct. Upsets happen for many reasons. Some games are very evenly matched, so the favorite has very little advantage to begin with. Additionally, luck plays a large role in determining outcomes. It's hard to say exactly how well the theoretically best possible model could do, but from my experience it seems it would be something just under an 80% correct rate.

Ultimately, we're not talking about 71% vs. 67%. We're asking how far from 50% and how close to 80% can a prediction model get.

  • Spread The Love
  • Digg This Post
  • Tweet This Post
  • Stumble This Post
  • Submit This Post To Delicious
  • Submit This Post To Reddit
  • Submit This Post To Mixx

11 Responses to “NFL Stats Overview”

  1. MarkO says:

    Brian,

    You indicate a record of 50-18. I ask becasue I assume you began tracking in Week 5, and furhter assuming 14 games perweek that would be a tally of 56 games, versus your 68 total. Just curious, and fascinated by your work here. From a fan.

  2. Brian Burke says:

    Mark-This year I started in week 4. There aren't always 14 games in a week. Weeks 6 and 8 only had 13 games. All total, I've predicted 68 games going into this week, week 9.

  3. Brian Burke says:

    And thanks for the compliment.

  4. MarkO says:

    Brian, wow I didnt expect to see you for a week. I do have a follow up question. How exactly are you tallying W/L? I mean, it it Team A was at 51% and won and covered the LV number therefore its a W?

  5. Brian Burke says:

    It's straight-up. If the model predicts a game at 51/49 and the 51 team wins, I count it as correct.

  6. MarkO says:

    So you are not accounting for the spread number???

  7. Brian Burke says:

    This is all straight-up, no spreads. No one could predict ATS at 75% accuracy! The best models barely get above 55%, and most of that is simply luck. They can't sustain those rates very long.

  8. MarkO says:

    Oh i agree and wasnt being led astray that you were hitting 75% ats, but I thought I reads you say something like using the spread as a baseline, so I though maybe by chance it was accounted for. I do however find using models yours are solid when wagering on the highest probability siutations. You dont account fore a score, where as I use another model that produces a probability to wina to cover based on predicted score. I far exceed the 53% standard or whatever everyone points to. It gave me a 5-1 morning juat today. I will continue here as an interested fan. Best wishes.

  9. Anonymous says:

    I can say this. For fourteen years I authored an NFL football publication that appeared in many of this nation's newspapers. In the feature I included picks in all NFL games for that week. With one possible exception, IU averaged about 58% percent. I was considered pretty good so you are right about that. If you did 63% consistently, you were above average for sure.

    I can't speak for you, but going backward and look at the history of games and then trying to forumlate results from that exercise only works if there is absolutely no human input. I used computers for my work but there was always some input by me so analyzing past data (which we used in our initial programming) always left us open to coloring the results.

    I like your approach. But we all know the football has points on both ends. And then there is the point spread. favorites in the NFL win about 68% of their games. But they only cover half. Underdogs win 32% of the games and cover in half. That makes it tough to either bet the favorite and give the points or always take the dog with the points. It just isn't that easy.

    In the final analysis the one ingredient that has so much to do with winning and losing and points is the one you cannot pre-estimate and that is luck. I'd rather be lucky than good. Statistics are fun but the work best in hindsight.

  10. Vartan says:

    Hi!

    Yesterday it was the first time I found your site. I started predicting soccer matches, but I realized NFL has a much better set of available data, so I extrapolate some of my findings on soccer on match prediction.

    According with the model which Im using is a Bayesian parametic.

    = ROUND(120*((AI2*AL2)/('DEFENSE-09'!$AB$35/120))/(((AI2*AL2)/('DEFENSE-09'!$AB$35/120))+(((1-AI2)*(1-AL2))/(1-('DEFENSE-09'!$AB$35/120)))),0)

    AI2= AVERAGE POINTS SCORED BY LOCAL TEAM PER GAME
    AL2= AVERAGE POINTS ALLOWED BY VISITOR TEAM PER GAME
    'DEFENSE-09'!$AB$35 = AVERAGE POINTS ALLOWED ALL NFL TEAMS 2009

    Those points are divided by 120, ex. a 24 point game team it will be 24/120 = 0.2 in the formula.
    120 are half minutes.

    This model yields a 73% accuracy on game result and 59% against spread.

    Of course, I'm interested on improving model prediction and your site is so interesting. I'll keep reading everything.Thanks for sharing your knowledge.


    Greetings from Mexico.

  11. Anonymous says:

    I need your help.
    I have two questions about the Superbowl that I could use your help in finding the answer.
    1. What team (NO or Indy) had the Superior Net Penalty YDS?
    2. What team had the best Net Punts (total)(NO or Indy) this year?
    thanks for your help.
    Phil pmendels@optonline.net

Leave a Reply

Note: Only a member of this blog may post a comment.