It's time to launch the first rankings of the season. The rankings and prediction model was redone prior to last season, but just like always, it's a team-efficiency logistic regression model. It's based on passing, running, turnover, and penalty efficiency. Since last season running is represented by Success Rate (SR) rather than Yards Per Carry (YPC). SR correlates far better than YPC with winning games. YPC is too susceptible to a handful of relatively rare break-away runs and wrongly penalizes successful plays in short yardage situations. I believe the revised model better reflects the true inner workings of the sport.
There are always new readers each year, so here is a quick and dirty refresher on how the model works. (Most of this write-up is taken from last season's first rankings post.) A logistic regression is fed net YPA, run SR, and interception rates on both offense and defense, plus offensive fumble rate. Team penalty rates (penalty yds per play) and home field advantage are also included. These particular aspects are selected because they are predictive of future outcomes, not because they explain past wins. This is a distinction overlooked by most experts and even other stats-oriented sites.
The regression produces the coefficients used in the model. In other words, it tells us how each facet of team performance is best weighted to predict which team will win a game. Each team variable is regressed again to account for how reliable each particular facet is throughout a season. In other words, the facets vary in terms of how consistent they are from game to game. For example, offensive passing efficiency is most consistent, and turnover rates are least consistent.
Turnover rates explain past outcomes very well, but a relatively small part of turnover rates are carried forward. If a team has a very low interception rate of 1.0%, how likely are they to continue the season with few interceptions? Chances are they will remain better than average, but not nearly as low as 1%.
Next, I create a notional team with all league-average statistics. With the regressed values of team efficiency, I use the model to generate the probability each team would win a game against the average team at a neutral site. I call the result Generic Win Probability (GWP). In theory, this should be a team’s long-term ‘true’ winning percentage.
But it’s not complete. Lastly, I take each team’s average opponent GWP, and use it adjust the numbers so that the final GWP accounts for previous strength of schedule.
Generic offensive team efficiency (OGWP) can be estimated by setting each team’s defensive variables to the league average, and re-computing their probability of beating a completely average team. Generic defensive efficiency (DGWP) can be estimated in a similar way.
An explanation of the principles behind the model and an example of how it is calculated can be found here.
Each season, I end up answering the same challenges to the results of the model. So I’ll preemptively address the most common ones.
1. “Your dumb model fails to conform to my intuitive beliefs about how good each team is. And besides, it does not conform to what I’ve been told to think by [major media personality].”
Answer: What can I say? Who is right? All the talking heads that told you for weeks the Patriots and 49ers were the best teams in the NFL, or the numbers here that told you that 9-7 Giants was actually the better team? Your intuitive estimates of team strength are far less accurate than you imagine. The thing is, you’ll forget how wrong you were by the end of the season, and re-wire your memory to trick yourself into believing you ‘knew it all along.’ We all do. Want to have a laugh? Go back and look at the expert predictions in the early weeks of 2010, or 2009, or whichever year you can find. This model will have some laughers too, just not nearly as many. Good statistical models have no preconceived biases, are not wowed by spectacular but lucky game-winning plays. They don’t follow the crowd. They don’t believe in streaks, destiny, grudges, or momentum. They don’t chase recent wins.
2. “Your model doesn’t take in to account determination, good coaching, effort, and character.” Answer: Yes it does. To the degree those things show up on the field on Sundays, those things are captured.
3. “How can team X have a higher ranking than team Y if team X’s offense and defense are both ranked behind team Y’s?”
Answer: I can understand this question. It’s unusual, but in some cases the OGWP and DGWP don’t make logical sense when you mentally combine them and compare them to a team’s overall GWP. This is mostly because of penalty rates. The NFL tracks team penalties but does not divide them into offensive and defensive categories, so they count neither toward OGWP nor DGWP. They are, however, included in overall GWP. If you see a team with an unusually high or low GWP compared to their O or D rankings, check out their penalty rate. It’s probably well above or below average. Also, the final results depend on how teams are bunched together. Sometimes a #3 DGWP team is a mile ahead of the #4 team, and sometimes it's just a hair better than the #4 through #9 teams.
4. “How can you possibly have [perennial doormat] Team X ranked ahead of [current media darling] Team Y?”
Answer: Look at the efficiency stats in the second table below. That’s just about all you need to know. Yes, I understand no one else outside of the state of Texas has the Cowboys ranked as the #2 team in the league this week. But are they aware that (despite their injuries) DAL has an 8.3 net YPA? Are they aware that they allowed only a 5.8 net YPA on defense? Both are near the very top of the league. True, there are a couple other teams with as good numbers as these, but how tough has their schedule been?
Example: Why is NO #1 and GB #9 when GB beat NO in week 1?
Answer: Check their numbers. Check their opponent strength. (GB has a relatively poor defensive pass efficiency and has played a somewhat lighter schedule.)
5. “Your model said Team X had a 90% chance of beating Team Y, but they lost! Ha!”
Answer: Yes, that happens…about 10% of the time. And in fact, I’m glad it does. If it didn’t, the model would be under-confident.
6. “Your model doesn’t account for the fact that [undrafted rookie 4th string quarterback] is starting in place of [superstar who just got injured].”
Answer: That’s true. Use the model as a starting point, and adjust on your own. Or, even better, we can insert a reasonable guess as to the rookie’s expected net YPA and interception rate, and recompute. What’s amazing to me is that the model is completely unaware of injuries, and yet still manages to slightly outperform the market most years.
7. "It's easy to pick straight up winners. You don't pick against the spread." Oh, it's easy? Really? You might want to double check that. See #1 above regarding "knew it all along."
8. "Website Z has Team X ranked 10th but you have them ranked 3rd! And their rankings conform to my intuitive expectations!"
Answer: The stats at Website Z are crap. Stop reading it. Go wash your eyes out with rubbing alcohol, and come back here after your sight returns and reread what I wrote about predictive vs. explanatory stats.
Ok, I could go on, but that’s it for now. Here are the first rankings of the 2012 season. Click on the table headers to sort. See the second table below for raw team efficiency stats.
           
| 1 |  HOU | 7 | 0.71 | 0.52 | 3 | 7 | 
| 2 |  PHI | 1 | 0.68 | 0.57 | 10 | 1 | 
| 3 |  DEN | 3 | 0.66 | 0.59 | 6 | 3 | 
| 4 |  ATL | 6 | 0.66 | 0.53 | 13 | 11 | 
| 5 |  BAL | 9 | 0.65 | 0.55 | 1 | 17 | 
| 6 |  CAR | 20 | 0.62 | 0.43 | 4 | 21 | 
| 7 |  SF | 2 | 0.61 | 0.51 | 9 | 6 | 
| 8 |  ARI | 12 | 0.60 | 0.57 | 14 | 2 | 
| 9 |  NYG | 13 | 0.59 | 0.53 | 2 | 24 | 
| 10 |  GB | 4 | 0.55 | 0.52 | 8 | 10 | 
| 11 |  DAL | 5 | 0.55 | 0.50 | 16 | 8 | 
| 12 |  MIN | 21 | 0.54 | 0.47 | 15 | 15 | 
| 13 |  BUF | 14 | 0.53 | 0.45 | 12 | 18 | 
| 14 |  NE | 10 | 0.53 | 0.55 | 11 | 20 | 
| 15 |  SD | 15 | 0.52 | 0.49 | 27 | 13 | 
| 16 |  SEA | 16 | 0.49 | 0.57 | 19 | 9 | 
| 17 |  MIA | 18 | 0.47 | 0.53 | 24 | 16 | 
| 18 |  CHI | 23 | 0.47 | 0.43 | 28 | 5 | 
| 19 |  CLE | 28 | 0.47 | 0.55 | 29 | 4 | 
| 20 |  NYJ | 19 | 0.46 | 0.47 | 21 | 19 | 
| 21 |  CIN | 31 | 0.45 | 0.46 | 5 | 27 | 
| 22 |  DET | 8 | 0.44 | 0.45 | 7 | 31 | 
| 23 |  TB | 22 | 0.43 | 0.58 | 30 | 12 | 
| 24 |  JAC | 24 | 0.42 | 0.54 | 23 | 23 | 
| 25 |  PIT | 26 | 0.42 | 0.51 | 20 | 22 | 
| 26 |  OAK | 17 | 0.42 | 0.47 | 22 | 26 | 
| 27 |  KC | 30 | 0.41 | 0.49 | 25 | 28 | 
| 28 |  TEN | 29 | 0.38 | 0.49 | 26 | 29 | 
| 29 |  IND | 25 | 0.37 | 0.48 | 17 | 30 | 
| 30 |  STL | 11 | 0.36 | 0.39 | 31 | 14 | 
| 31 |  NO | 32 | 0.27 | 0.43 | 32 | 25 | 
| 32 |  WAS | 27 | 0.27 | 0.36 | 18 | 32 | 
           
| ARI | 5.9 | 33 | 1.1 | 2.1 | 4.8 | 61 | 1.7 | 0.48 | 
| ATL | 6.9 | 30 | 0.9 | 0.0 | 5.4 | 50 | 6.5 | 0.26 | 
| BAL | 7.7 | 51 | 1.8 | 0.7 | 7.4 | 60 | 2.7 | 0.56 | 
| BUF | 6.7 | 47 | 3.5 | 1.4 | 6.2 | 63 | 3.6 | 0.42 | 
| CAR | 8.7 | 42 | 5.8 | 2.2 | 6.4 | 57 | 1.8 | 0.32 | 
| CHI | 5.4 | 36 | 6.5 | 0.0 | 4.8 | 57 | 5.3 | 0.41 | 
| CIN | 8.1 | 38 | 3.1 | 1.3 | 6.9 | 41 | 0.0 | 0.46 | 
| CLE | 5.1 | 38 | 5.2 | 0.8 | 6.2 | 57 | 4.1 | 0.49 | 
| DAL | 6.9 | 38 | 2.8 | 2.1 | 4.7 | 60 | 1.3 | 0.68 | 
| DEN | 6.3 | 45 | 2.6 | 2.0 | 6.2 | 70 | 1.9 | 0.55 | 
| DET | 7.2 | 43 | 3.0 | 0.6 | 7.2 | 57 | 0.0 | 0.47 | 
| GB | 5.1 | 50 | 1.7 | 0.6 | 4.4 | 47 | 5.4 | 0.71 | 
| HOU | 7.4 | 42 | 1.0 | 0.5 | 4.8 | 56 | 2.8 | 0.43 | 
| IND | 6.3 | 35 | 3.3 | 1.4 | 7.1 | 57 | 1.1 | 0.45 | 
| JAC | 5.1 | 46 | 0.0 | 1.6 | 7.0 | 55 | 0.9 | 0.43 | 
| KC | 5.8 | 45 | 3.4 | 2.7 | 7.4 | 57 | 1.2 | 0.24 | 
| MIA | 5.5 | 47 | 3.9 | 1.2 | 7.3 | 70 | 2.4 | 0.39 | 
| MIN | 6.8 | 38 | 0.0 | 2.4 | 5.6 | 58 | 1.0 | 0.41 | 
| NE | 6.8 | 39 | 0.8 | 0.0 | 7.0 | 61 | 1.8 | 0.40 | 
| NO | 5.9 | 38 | 3.6 | 0.7 | 8.2 | 60 | 1.1 | 0.41 | 
| NYG | 7.8 | 37 | 2.5 | 0.6 | 8.2 | 54 | 6.7 | 0.34 | 
| NYJ | 6.8 | 33 | 3.0 | 0.7 | 6.4 | 50 | 4.0 | 0.52 | 
| OAK | 6.4 | 31 | 1.6 | 0.7 | 6.9 | 65 | 0.0 | 0.26 | 
| PHI | 6.3 | 47 | 4.8 | 3.5 | 4.8 | 68 | 5.0 | 0.57 | 
| PIT | 6.6 | 33 | 0.8 | 1.2 | 6.2 | 50 | 1.1 | 0.66 | 
| SD | 6.0 | 36 | 2.9 | 1.3 | 5.9 | 63 | 1.7 | 0.38 | 
| SF | 5.7 | 50 | 1.1 | 2.0 | 6.0 | 69 | 1.8 | 0.50 | 
| SEA | 4.7 | 41 | 1.3 | 0.7 | 5.1 | 59 | 1.7 | 0.66 | 
| STL | 5.4 | 39 | 3.2 | 1.4 | 6.4 | 54 | 4.6 | 0.45 | 
| TB | 5.2 | 39 | 3.8 | 0.8 | 8.2 | 71 | 4.9 | 0.38 | 
| TEN | 7.0 | 24 | 1.7 | 2.5 | 7.6 | 53 | 0.8 | 0.31 | 
| WAS | 6.8 | 48 | 1.1 | 0.6 | 8.4 | 54 | 3.5 | 0.73 | 
| Avg | 6.4 | 40 | 2.6 | 1.2 | 6.4 | 58 | 2.6 | 0.46 | 
 
CLE is 0-3...
cool stuff. thanks
interested to know what lead to CAR moving up 14 spots after losing 36-14 to NYG.. is there a concept of garbage time?
I'm sure this ranking will go over well with your Washington post crowd.
Speaking of the Redskins, they do seem rather underranked here. The model has them ranked 18th in offense, but they are 12th in OPass, 4th in ORunSR&, 8th in OInt%, and 5th in OFum%.
I assume opponent adjustments are dragging them down here?
Do the win probability graphs take the efficiency rankings into account? In other words, are you weighting WP with these rankings or is it as if two average teams were playing each other for each of those graphs?
I've been following these efficiency ratings for the last two seasons now, this being my third, and usually my expectations or intuition puts teams pretty close to the efficiency rankings, but I have to say that this ranking is WAY out of my comfort zone. The only teams that are pretty close in ranking to my intuitive rank are the 5 teams that I regularly follow: GB, ATL, CHI, MIA, and SF. I'll admit MIA is a bit higher than I'd expect, but I haven't had a chance to sit down and watch a full game of theirs yet, whereas I've seen all of the other 4 teams' games.
Probably the most shocking results here are MIN over NE, and the skyrocketing that CAR apparently did last week given the drubbing they got from NY.
This site and these rankings are always one of my favorite parts of the football season, and I can't wait for this week's games now that the rankings are out!
Quick question from a newbie: just to confirm, the model does not take into account any aspect of special teams performance?
I'm sure there's a perfectly intelligent reason for its omission .. just curious as to why.
Thanks.
@Jay Berg
The last week ranking is I assume last year's ranking. So Carolina didn't move up from #20 to #6 after the Giants game.
@Anonymous
Past special team performance has not found to be predictive of future special team performance.
Here's a list of teams I think would be perceived to be off by the national consensus:
Too high - Philly, Carolina, Minnesota, Miami, Cleveland
Too low - Washington, New Orleans, Pittsburgh, Detroit, New England
That leaves 22 out of 32 (69%) roughly where'd you expect. Going by the Bill James theory that a good list would be about 80% of what you'd expect with 20% surprises, this isn't too far off.
How are the niners ranked 5th last against the run?
This is nuts! I look forward to Brian's week 4 rankings every year - there are always a few surprises but I usually trust that his model knows more than I do.
This time it is completely different to my expectations (and again I am going to mostly trust it). My favourite is that Philadelphia managed to be ranked first last week despite a point differential of +2. They then get creamed by "perennial doormat" Arizona and only drop one place! (Its because of the inordinately high number of fumble turnovers, I think)
I do have a question though - how do the replacement refs affect these rankings? I would guess that the model would be more predictive this year if all efficiencies were regressed to the mean more that usual. In a normal year, HOU would be expect to beat WAS almost every time, but who knows what the refs might do at a crucial moment?
Well, it looks like the Rams will be in good shape...next April.
If these rankings hold, they'll have the first and third overall picks in the draft.
How about Challenge #9?
"Your model fails to take into account special teams, which are an important part of the game."
@Scott M
Past special team performance has not found to be predictive of future special team performance.
=============================
Links?
It's nuts to me too. Keep in mind it's only 3 games of data.
funny, I like your model more when it ranks my team higher than I expect.
In all seriousness - I have learned that stat systems are better than intuitive ratings - rather they be my own or "expert media " rankings. But with so much random luck in the game, no system will predict things often enough to convince most people.
@Anonymous
Check out this link where Brian explains all the data he looked at and how he decided what to put into the model.
http://www.advancednflstats.com/2007/07/what-makes-teams-win-part-1.html
"no one else outside of the state of Texas has the Cowboys ranked as the #2 team in the league this week"
Am I missing something? Aren't they #11?
Brian you should probably clarify that "last week" is actually end of last year (I think...), or at least N/A that column if this is the case
As a Jets fan...
With the Jets losing Revis,
and game 1 (the blow out of Buffalo) as a fluke,
I think the Jets will go 4-9 teh rest of teh season.
They are a weak team
I'm in a survivor pool with 60 people where the vast majority (maybe 35?) will likely pick Baltimore over Cleveland due to the 12.5 point spread.
Based on your data, it looks like Denver over Oakland is actually a better bet when you compare GWP, despite the line of only 6.5 points.
Is there a way to convert this data into win percentages? Would you claim that Denver is actually the better bet?
Given that NO is 32 "last week", it definitely means it is actually the previous week and not the end of last season. I think its still the right call to N/A the column since the whole point of not doing rankings until now is that there isn't enough data..
I'm wondering if Carolina's high ranking is because of their high passing YPA. I recall past articles here emphasizing how important passing offense is to winning, more so than defense or rushing offense. So having a strong passing attack may contribute greatly toward a high ranking, while having a weak defense doesn't hurt so much. Carolina's high INT and fumble rates don't ding them much because those stats are said to be more random, while their low penalty rate does help them out because penalty rate is supposed to be more predictive.
Using “traditional” statistics, Carolina is tied for #1 in the NFL in raw yards per attempt with Cincinnati. They’ve scored far fewer points and thrown more interceptions and fewer touchdowns. Yet they are ranked better than Cincinnati. I wonder if Carolina isn’t so much “good” as they are good in just the right ways to achieve a high ranking by the WP model.
Gotta say, GB and WAS leading the way in Penalty rate after TERRIBLE GAMES from the same officiating crew (albiet for only 1/3rd of those games) immediately screaqms out to me that the ref situation is heavily affecting these rankings.
PS. I am a very biased Packer fan, but for one time only I feel okay about being thinking this for now without necessarily a great reason. We'll see how things play out..
Last week means, following week 2 of this season. I usually blank that out, but left it in because I was so pressed for time. It's illustrative just how unsettled these estimates are so early in the season. Keep in mind we have 50% more data than we did just one week ago. But as always, lots of grains of salt.
Washington's low ranking makes perfect sense if you look at the numbers. They've played the league's weakest schedule but are ranked just 18th on offense and 32nd on defense. They're decent at throwing the ball, and they run it reasonably well, but they fumble a lot and they have the league's worst penalty rate.
This is awesome. I recently gained a newfound excitement for football and the NFL through statistical prediction and analysis and have been searching for somewhere that did this type of number crunching. I look forward to seeing how this season progresses and might take a crack at tweaking the model presented.
Re: Challenge #2:
Just throwing it out there: The efficiency model doesn't take into account good *strategic* coaching. You know that, but not all the readers do.
Brian;
Do you know if the official pass attempted stats include (sacks and int) If not, do you choose to add them to attempted total before calculating opass eff. thanks
There are 2 official passing stats: net passing yards and total passing yards. 'Net' means sack yards are included in the sum. Interceptions are always considered attempts.
My own 'opass' is (total yards - sack yards) / (attempts + sacks)
Playoff odds are available on nfl-forecast.com.
Early SB favorites are the Texans over the Falcons.
One correction to the Randy Mossesque Q&A "No one outside of Texas..." should read "No one outside of North Texas and the Rio Grande Valley..." give the rest of us some credit! On a serious note, appreciate the hard work. Alot of the math is beyond me, but I know enough to get the point and understand the models. You do a great job of explaining the math so that "we" can follow along. Keep up the good work!
@ annonymous
Dont forget that random luck has an even distribution (the difference between "random luck" and "making your own breaks" being that if a team is "luckier" then another, it will show in the stats as success rates, turnover percentage because that "luck" is more a matter of skill. Case in point, the Texan's couldn't buy a break 2 years ago, because they sucked.)
"Each team variable is regressed again to account for how reliable each particular facet is throughout a season."
Is this a seperate regression? Is the reliability not accounted for in the coefficient?
Brian;
Have you investigated the role 'score effects' play on
eff. stats?. Ex. Denver ran up many successful plays down by three scores against Atlanta. This makes their Opass stronger than it really is? I am referring to situations more than 'just garbage time'. Would it be possible to
calculate the average net yds per pass at different game score states and use this to adjust?
I know in Hockey teams leading change their strategy allowing more shots than when tied. I then adjust accordingly,
DO you think teams in NFl do the same thing? Do they play less aggressive protecting big plays a nd sacrifice net passing yds? IF so, wouldn't it improve your model to adjust each teams stats based on the state of the score differential during the game? any thoughts?
....I'm convinced this strategy would be very effective...Dan
Can anyone post the predictions from nfl forcast..I can't get java to work...would be interested to see what Brian's WP translate to spreads?
Unknown-I did once test limiting all stats to the first 3 quarters. I realize that throws a lot of baby out with the bathwater. The results were slightly less accurate than with all the data, including trash time.
Brian, I believe that you previously argued that DINT rate isn't very predictive, is this no longer the case?
Confused as to why Denver is rated so high, despite only being marginally above average in everything aside from DRUNSR%?
I suppose run success rate is much more highly weighted than rushing YPC in the past. Denver is rated much higher offensively than say NE, but really the only advantage they have statistically is 6% in success rate running the ball.
I think it also has to do with the fact they have the highest Opp GWP. You would expect their numbers to be below average if they were an average team. So their own GWP gets corrected for this.
Thanks for your response Brian;
Maybe in the offseason we could look at it?
It is a game theory question as well?
How do teams play with a lead?
It would take a lot of time...
In the meantime I say Denver is overrated. :)
Do the coefficients for the regression change on a weekly basis as more data from the season is input?
David Cooper, the regression coefficients are from years and years of data, so adding 16 games to the thousands in the regression would have minimal impact on the coefficients.
Re: Brian's test using only the first 3 Q's: It would prob. be too subjective, but any game that would have a point differential of >14 entering the 4th Q AND ends with a final differential of >9 might be worth dropping the 4th Q of data. In other words, the trailing team might end up with a TD early in the 4th Q to cut the deficit to 10--if they make it less, then they prob. had a chance, even if a slight one, to tie the game. These stats probably relate well to the teams' overall strength--for example, the DEN-ATL week 2 MNF. If the trailing team never could get within a potential tying score (for example, TNF's CAR-NYG game), then there is a good probability to disregard the 4th Q, as the leading team is prob. more interested in running clock than giving 100% effort and strategy toward scoring, and the trailing team is prob. not using the most optimum strategies that would correctly demonstrate (statistically) their strengths and weaknesses.
Brian, have you ever tried something this subjective (although with objective parameters?
Brian;
a question about this weeks ratings?
Have you accounted for the fact dallas (and other teams) has played two road games out of 3, and some teams like Chi have played 2/3 at home? This accounts for a large +,-8% swing in stats that I feel needs to be adjusted for I have Dallas 71% vs Chicago (using a system similar your methodology.)
They have better stats at neutral site, have played a tougher schedule and 2 on the road? thanks Dan