NFC North 2007 Predictions

Here are the predictions for the 2007 season in the NFC North. Methodology is described here. Note the predictions do not account for personnel changes such as free-agent acquisitions, draft picks, retirements, or otherwise. It is still the best purely statistical prediction available.



NFC South 2007 Predictions

Here are the predictions for the 2007 season in the NFC South. Methodology is described here. Note the predictions do not account for personnel changes such as free-agent acquisitions, draft picks, retirements, or otherwise. It is still the best prediction available.





AFC East 2007 Predictions

Here are the predictions for the 2007 season in the AFC East. Methodology is described here. Note the predictions do not account for personnel changes such as free-agent acquisitions, draft picks, retirements, or otherwise. It is still the best prediction available.


AFC South 2007 Predictions

Here are the season predictions for the AFC South. The solid lines are the probabilities (on left of graph) that each team will finish with that particular amount of wins. The curved line is the probability each team will finish with at least that many wins (on right). Methodology of estimating these probabilities is explained here.




AFC North 2007 Predictions

Based on the assumptions described previously, here are the predictions for the AFC North division. The graphs represent season wins for each team. The solid bars represent the probability that the team will achieve exactly that number of wins in 2007. The curving black line represents the probability that win at least that many wins.

Notice how that for each team the distribution of win probability is almost perfectly normal. The "at least" win distribution is sigmoidal, i.e. an S-curve. It is a CDF--a cumulative distribution function.

Note that the model does not predict CLE will have have 4 wins. It predicts that it's win-probability-distribution is centered at 4 wins. The Browns could have up to 9 wins, or could go 0-16. Although BAL has the best chance to win the division, PIT and CIN have solid chances as well.

This does not take into account personnel changes--draft picks, free agent singings, or retirements (or suspensions in the Bengals case). It merely extrapolates last season's performance stats onto this year's schedule, then calculates the total probabilities of every possible outcome.

2007 Season Win Predictions

I've assembled a new and larger database and tested the most predictive variables for wins. Now it's time to apply the data to the upcoming 2007 NFL season.

Using data from the 2002-2007 seasons, I've run a logit regression on the efficiency stats most predictive of winning for every game played. The result is a model that correctly predicts the winner of 68.9% of games during the 5 most recent seasons.

For every game, the model produces a probability that each team would win. We can apply this model to future games as well, assuming we can estimate the values of each team's efficiency statistics. As any shrink will tell you, the best predictor of future behavior is past behavior. We'll use 2006 stats as our baseline for 2007. I realize this is highly imperfect, but it is the best predictor available. Although this method does not account for personnel changes, efficiency stats are relatively very steady from year to year--much steadier than actual win totals. Previous year 'expected win' totals are singificantly better predictors of the following year's record than are previous year actual wins. Plus, this is all just for fun.

I now have win probabilities for all 256 upcoming regular season games and I've sorted them by team. The probabilily of every possible sequence of game outcomes for a team (there are 2^16 of them--65,536) was computed. Then each sequence of outcomes that result in a certain number of total wins is summed. (There is 1 possible sequence for 16 wins, 16 sequences for 15 wins, 256 sequences for 14 wins,...) Now I have estimated probabilities for every win total for each team.

For example, the Ravens have a breakdown of win probabilities that centers on 11 wins. Although they had 13 wins last year, they appeared to squeak by on a couple wins on luck, and they had a relatively easy schedule. This year, they'll play a much tougher schedule. The table below lists the probabilities for each game on their '07 schedule based on last year's efficiency stats.

Vis Home VPROB HPROB
---------------------------
BAL CIN 0.54 0.46
NYJ BAL 0.26 0.74
ARI BAL 0.17 0.83
BAL CLE 0.82 0.18
BAL SF 0.69 0.31
STL BAL 0.32 0.68
BAL BUF 0.67 0.33
BAL PIT 0.47 0.53
CIN BAL 0.28 0.72
CLE BAL 0.09 0.91
BAL SD 0.32 0.68
NE BAL 0.34 0.66
IND BAL 0.49 0.51
BAL MIA 0.57 0.43
BAL SS 0.69 0.31
PIT BAL 0.34 0.66
The next table lists the probability of winning each possible number of games. The 'cumulative' column lists the probability of the Ravens winning at least that number of games.

WINS PROB CUMULATIVE
---------------------------
16 0.00 0.00
15 0.01 0.01
14 0.03 0.04
13 0.09 0.14
12 0.17 0.30
11 0.21 0.52
10 0.20 0.72
9 0.15 0.87
8 0.08 0.95
7 0.03 0.99
6 0.01 1.00
5 0.00 1.00
4 0.00 1.00
3 0.00 1.00
2 0.00 1.00
1 0.00 1.00
0 0.00 1.00
The stats say 11 wins is the most probable of all win totals for Baltimore (21% probability). Keep in mind however, they're far more likely to finish with some other total (79%). The cumulative probabilities indicate they have a 50/50 chance to finish with at least 11 wins (52%).

Year-to-Year Team Wins

My recent analysis of draft picks and their relative value was complicated by the problem of every team’s tendency to regress to mean. Good teams appear to get worse, and bad ones appear to improve. Because draft picks are allotted according to team record, it is very difficult to tell whether the change team in record was due to the draft picks.

One factor dominates the equation when trying to predict the following year’s record with draft values—the previous year’s record. Teams with high numbers of wins tend to have fewer wins the following year, and teams with low win totals tend to have more wins in the next season. This tendency is strong.

In fact, if you look at a graph of every team’s change in wins from year to year (DELTAWINS) vs. the team’s win total from last year (LASTWINS), you see a very strong trend towards mediocrity, i.e. regression to the mean. The graph includes all teams from 2002-2006.


In fact, between ‘02-’06 no team that had 13 or more wins improved their record the next season, and no team with 3 or fewer wins got worse. Only 1 team with 4 wins earned fewer wins, and only 2 teams with 5 wins got worse.

The reasons for teams to tendency to regress to the mean are likely the obvious ones. First, there is the scheduling system. Each team is given 2 strength of schedule games in which they play the teams that finished in the same order in their respective divisions. First-place teams play the other first place teams, second place teams play each other, and so on. Additionally, there is the draft which allocates draft position in reverse order of win-loss records. Lastly, there may be effects of salary cap boom/bust cycles in which individual teams load up on talented and costly players by amortizing signing bonuses into out-years. This causes teams to 'purge' their rosters of those players, and others, to allow cap room for the dead weight of past signing bonuses.

NFL Stats Overview

This post is intended to summarize the statistical methods commonly used throughout this site. The statistical model uses each team's efficiency statistics in running and passing, on both offense and defense, to predict team wins. Turnover rates and penalties are also included. Multivariate linear regression was used to estimate total season wins, and multivariate logit (non-linear logistic) regression was used to estimate individual game probabilities.

Efficiency stats are defined by yards per play. Passing efficiency is defined as the average yards gained (or lost) per passing play. I included yards lost in sacks in both offensive and defensive pass efficiency. Likewise, run efficiency is defined as yards per run. Turnovers are also defined as efficiencies. Interception efficiency is defined as interceptions per pass play. Fumbles are defined as fumbles per any play--both run and pass plays can result in fumbles.

Efficiency stats were found to be correlated with wins much better than total yardage stats. Teams with poor defenses that find themselves trailing (and likely to lose) tend to accumulate large amounts of gross passing yards, but only because of frequent attempts. Teams that are ahead (and likely to win) tend to accumulate large amounts of gross running yards. In both cases it's often the winning or losing that lead to the yardage, not the other way around. For this reason, efficiency stats are almost always a better measure of a team's proficiency than gross yardage stats.

Once I established a workable, logical, predictive, and statistically significant model of winning and losing, there were any number of interesting applications. I could now predict individual games. Summing the total probabilities of all 256 games in the season estimates the probabilities of a total number of wins for each team. Toward the end of the season, the game-by-game model can also be used to compare likelihoods of selected teams earning a playoff spot, or will capture a wildcard berth. I could also create very accurate power (or efficiency) rankings. Another application was that I could estimate how lucky each team was, and how much luck played a part in determining game and season outcomes.

I'm not just trying to predict games. I'm trying to understand the game itself. For example, which is more important--running or passing? Offense or defense? Interceptions or fumbles? Do special teams matter? But answering those questions accurately depends on how accurate the model is, which can only be judged by how well it predicts wins. We each may have intuitive answers to these questions, but statistics is one of the best ways to test our hunches. We can even quantify them.

2006 was the first season in which I did all of this number crunching. I needed some minimum amount of data to use for the efficiency stats, so I started predictions after week 4. As the season went on, I had a larger data set. Ulitmately, here is how my predictions fared:

At first glance, 63% correct may not appear too impressive, especially for all that work I did. But almost 2 out of 3 games correct isn't too bad, especially when compared to the national consensus. Each week, I recorded not only my own results, but those of the Las Vegas odds makers as well. (Although my purposes here are not related to gambling, I'll use the betting lines as a benchmark.) The Vegas line was only accurate in picking winners 58.2% of the time for the entire '06 season. Vegas' record was slightly worse from week 5 onward, at 57.1%.


By end of the 2006 regular season, my model was correct significantly more often than the Vegas line, by 63% to 57%. On the surface, it may not sound as impressive as it really is. But think of game prediction this way: A monkey will guess winners correctly 50% of the time. So the real question is: how much better than 50% is the model? In this case, the efficiency model added almost twice the predictive power as the Vegas line (the consensus favorite).

The 2007 season was more predictable. The model was 70.8% correct compared to just 66.7% for the Vegas consensus. Over the past two years it has been the most accurate prediction system published.

Also consider that no model could be 100% correct. Upsets happen for many reasons. Some games are very evenly matched, so the favorite has very little advantage to begin with. Additionally, luck plays a large role in determining outcomes. It's hard to say exactly how well the theoretically best possible model could do, but from my experience it seems it would be something just under an 80% correct rate.

Ultimately, we're not talking about 71% vs. 67%. We're asking how far from 50% and how close to 80% can a prediction model get.