Team Rankings Week 4

Next week you can expect Zach Sanders (not pictured at left) to take over the weekly rankings digest. But this week I’m going to get the ball rolling with a discussion of the model and how it has changed for 2011.

The rankings and prediction model has been redone for 2011. Just like always, it's a team-efficiency logistic regression model. It's based on passing, running, turnover, and penalty efficiency. But now, running is represented by Success Rate (SR) rather than Yards Per Carry (YPC). SR correlates far better than YPC with winning games. YPC is too susceptible to a handful of relatively rare break-away runs. I think of running as a ‘jab’ and passing as a ‘cross’ or ‘uppercut’. The jab is a low-risk punch that doesn't expose your defenses, keeps your opponent off balance and guessing, and keeps him from purely defending against your cross. A good jab is a prerequisite, but the cross is what scores points and wins bouts. SR captures this aspect of the running game.

Running is more than that, of course. It’s essential in short yardage and inside the red zone, and when team has a lead, it burns clock and helps keep the ball out of their opponent’s hands in the 4th quarter. I believe the revised model better reflects the true inner workings of the sport.

There are always new readers each year, so here is a quick and dirty refresher on how the model works. A logistic regression is fed net YPA, run SR, and interception rates on both offense and defense, plus offensive fumble rate. Team penalty rates (penalty yds per play) and home field advantage are also included. These particular aspects are selected because they are predictive of future outcomes, not because they explain past wins. This is a distinction overlooked by most experts and even other stats-oriented sites.

The regression produces the coefficients used in the model. In other words, it tells us how each facet of team performance is best weighted to predict which team will win a game. Each team variable is regressed again to account for how reliable each particular facet is throughout a season. In other words, the facets vary in terms of how consistent they are from game to game. For example, offensive passing efficiency is most consistent, and turnover rates are least consistent.

Turnover rates explain past outcomes very well, but a relatively small part of turnover rates are carried forward. If a team has a very low interception rate of 1.0%, how likely are they to continue the season with few interceptions? Chances are they will remain better than average, but not nearly as low as 1%.

Next, I create a notional team with all league-average statistics. With the regressed values of team efficiency, I use the model to generate the probability each team would win a game against the average team at a neutral site. I call the result Generic Win Probability (GWP). In theory, this should be a team’s long-term ‘true’ winning percentage.

But it’s not complete. Lastly, I take each team’s average opponent GWP, and use it adjust the numbers so that the final GWP accounts for previous strength of schedule.

Generic offensive team efficiency (OGWP) can be estimated by setting each team’s defensive variables to the league average, and re-computing their probability of beating a completely average team. Generic defensive efficiency (DGWP) can be estimated in a similar way.

An explanation of the principles behind the model and an example of how it is calculated can be found here.

Each season, I end up answering the same challenges to the results of the model. So I’ll preemptively address the most common ones.

1. “Your dumb model fails to conform to my intuitive beliefs about how good each team is. And besides, it does not conform to what I’ve been told to think by [major media personality].”

Answer: What can I say? Who is right? All the talking heads that told you for weeks the 13-3 Falcons were the best team in the NFC, or the numbers here that told you that 10-6 Green Bay was actually the much better team? Your intuitive estimates of team strength are far less accurate than you imagine. The thing is, you’ll forget how wrong you were by the end of the season, and re-wire your memory to trick yourself into believing you ‘knew it all along.’ We all do. Want to have a laugh? Go back and look at the expert predictions in the early weeks of 2010, or 2009, or whichever year you can find. This model will have some laughers too, just not nearly as many. Good statistical models have no preconceived biases, are not wowed by spectacular but lucky game-winning plays. They don’t follow the crowd. They don’t believe in streaks, destiny, grudges, or momentum. They don’t chase recent wins.

2. “Your model doesn’t take in to account determination, good coaching, effort, and character.” Answer: Yes it does. To the degree those things show up on the field on Sundays, those things are captured.

3. “How can team X have a higher ranking than team Y if team X’s offense and defense are both ranked behind team Y’s?”

Answer: I can understand this question. It’s unusual, but in some cases the OGWP and DGWP don’t make logical sense when you mentally combine them and compare them to a team’s overall GWP. This is mostly because of penalty rates. The NFL tracks team penalties but does not divide them into offensive and defensive categories, so they count neither toward OGWP nor DGWP. They are, however, included in overall GWP. If you see a team with an unusually high or low GWP compared to their O or D rankings, check out their penalty rate. It’s probably well above or below average. Also, the final results depend on how teams are bunched together. Sometimes a #3 DGWP team is a mile ahead of the #4 team, and sometimes it's just a hair better than the #4 through #9 teams.

4. “How can you possibly have [perennial doormat] Team X ranked ahead of [current media darling] Team Y?”

Answer: Look at the efficiency stats in the second table below. That’s just about all you need to know. Yes, I understand no one else outside of the state of Texas has the Cowboys ranked as the #2 team in the league this week. But are they aware that (despite their injuries) DAL has an 8.3 net YPA? Are they aware that they allowed only a 5.8 net YPA on defense? Both are near the very top of the league. True, there are a couple other teams with as good numbers as these, but how tough has their schedule been?

Example: Why is NO #1 and GB #9 when GB beat NO in week 1?

Answer: Check their numbers. Check their opponent strength. (GB has a relatively poor defensive pass efficiency and has played a somewhat lighter schedule.)

5. “Your model said Team X had a 90% chance of beating Team Y, but they lost! Ha!”

Answer: Yes, that happens…about 10% of the time. And in fact, I’m glad it does. If it didn’t, the model would be under-confident.

6. “Your model doesn’t account for the fact that [undrafted rookie 4th string quarterback] is starting in place of [superstar who just got injured].”

Answer: That’s true. Use the model as a starting point, and adjust on your own. Or, even better, we can insert a reasonable guess as to the rookie’s expected net YPA and interception rate, and recompute. What’s amazing to me is that the model is completely unaware of injuries, and yet still manages to slightly outperform “the market.”

7. "It's easy to pick straight up winners. You don't pick against the spread." Oh, it's easy? Really? You might want to double check that. See #1 above regarding "knew it all along."

8. "Website Z has Team X ranked 10th but you have them ranked 3rd! And their rankings conform to my intuitive expectations!"

Answer: The stats at Website Z are crap. Stop reading it. Go wash your eyes out with rubbing alcohol, and come back here after your sight returns and reread what I wrote about predictive vs. explanatory stats.


Ok, I could go on, but that’s it for now. Here are the first rankings of the 2011 season. Click on the table headers to sort. See the second table below for raw team efficiency stats.




RANKTEAMGWPOpp GWPO RANKD RANK
1 NO0.66 0.5566
2 DAL0.66 0.5527
3 BAL0.65 0.5342
4 NYJ0.62 0.56113
5 NE0.62 0.50130
6 WAS0.60 0.55155
7 TEN0.60 0.5158
8 HOU0.60 0.52916
9 GB0.59 0.52319
10 DET0.58 0.401011
11 BUF0.57 0.511310
12 OAK0.56 0.54715
13 NYG0.55 0.511224
14 PIT0.54 0.461420
15 PHI0.49 0.441725
16 MIA0.47 0.541921
17 ARI0.45 0.451627
18 CHI0.45 0.53284
19 CIN0.45 0.422026
20 JAC0.44 0.55301
21 SD0.44 0.472128
22 STL0.44 0.562313
23 IND0.43 0.522423
24 DEN0.43 0.542218
25 CAR0.43 0.50832
26 MIN0.43 0.482512
27 TB0.42 0.451831
28 CLE0.41 0.452717
29 SF0.41 0.472614
30 KC0.36 0.53329
31 ATL0.33 0.452929
32 SEA0.31 0.473122



TEAMOPASSORUN%OINT%OFUM%DPASSDRUN%DINT%PENRATE
ARI7.3463.11.56.9553.60.46
ATL5.6333.32.87.4554.00.44
BAL6.8391.81.45.9634.30.39
BUF7.3493.60.07.0615.30.34
CAR7.7363.40.68.7611.30.53
CHI5.9312.60.86.4611.60.45
CIN5.7392.00.75.4561.10.35
CLE5.3451.80.75.4562.10.52
DAL8.3303.52.05.8672.90.47
DEN5.3352.72.86.8650.00.43
DET7.8361.70.05.0683.80.43
GB8.2411.01.37.6633.80.37
HOU8.0433.30.66.1552.90.43
IND4.3460.93.37.7623.20.27
JAC5.0377.10.66.2673.30.42
KC4.5276.03.87.3543.80.38
MIA6.5412.81.37.8501.70.43
MIN4.8441.20.76.7562.40.60
NE9.7453.80.08.4533.90.55
NO7.5411.51.26.6590.80.20
NYG7.1412.40.06.5552.70.36
NYJ7.0343.61.36.0575.70.32
OAK6.8471.21.36.0552.20.69
PHI6.5524.01.86.1552.30.38
PIT7.5353.73.84.7590.00.38
SD7.3434.81.27.3612.50.29
SF5.1291.40.76.1684.50.56
SEA3.8312.11.46.9632.20.48
STL5.0400.92.96.4570.90.58
TB5.9473.70.77.0561.80.41
TEN7.8271.81.45.1584.20.57
WAS6.3432.61.36.8523.10.20
Avg6.5392.82.46.6592.70.43


  • Spread The Love
  • Digg This Post
  • Tweet This Post
  • Stumble This Post
  • Submit This Post To Delicious
  • Submit This Post To Reddit
  • Submit This Post To Mixx

30 Responses to “Team Rankings Week 4”

  1. probablepicks says:

    "What’s amazing to me is that the model is completely unaware of injuries, and yet still manages to slightly outperform the market."

    Are you using your new model's back tested results as the basis for this claim? Or the "live" predictions of your last model? And what's the %?

    In "the market," the favorites win 66.6% of the time, according to my data.

  2. James says:

    My favorite post of the year is finally here! Seeing the Cowboys at the top is the icing on the cake.

  3. Brian says:

    I like the switch to success rate instead of yards per carry for the rushing variable. But can't the same theory be applied to passing as well? A 2 yard pass to pick up a first down is a positive play in WPA terms, but it will bring down the yards per passing attempt variable. Why make the switch for rushing but not for passing?

  4. Anonymous says:

    I love checking these rankings because it filters out so much random noise and gives a much better idea of how well a team's defense or offense is performing.

    One question though, I would think adjusting for opponent strengths is going to be wacky so early in the season. Over three games certainly a lot of teams have had lucky or unlucky streaks. For example, is it likely that JAC will stay the #1 defense and PIT the #20? And doesn't that change the offensive rankings of the teams they have played?

    Awesome analysis overall though, I love it.

  5. Anonymous says:

    Is it possible to use the GWP values for predictions, or would I have to run all the stats through that formula? Do you have a spreadsheet that you use?

  6. Brian Burke says:

    Hey, remember 2 weeks ago when the Steelers D was too old? They were washed up?

    Already they've got 4.7 net YPA allowed on defense, best in the league by far. And they haven't even had an int yet. That will regress. Bet on it.

  7. James says:

    Hey Brian, any thoughts on why there isn't an early dominate team? The GWP for the top 2 teams is 0.66 this year, but it was 0.72, 0.71; 0.80, 0.71; 0.78, 0.72 the past 3 years (using Week 4 data). Something to do with the inclusion of rushing success rate? A one-year fluke? Or is it just because the Patriots by far league-best offense is paired with a near league-worst defense?

    Also, is a low DRUN% good? I can't tell looking at the rankings.

  8. Anonymous says:

    I think it would be a good idea to show your model's accuracy over the past few years. There will always be people who come in and question your results, but showing the model's track record would reduce some of this.

  9. Anonymous says:

    I know it's been suggested here before, but someone should submit Brian's ratings to thepredictiontracker.com to see how it stacks up against other computer rating systems.

  10. Michael Beuoy says:

    Brian - Didn't you forget to mention the following term:

    if team==ATL then logit = logit - 0.50

  11. Jack says:

    why would the steelers pass defense regress? They played the two teams who are literally last in O PASS, and a Ravens team that is just ok in the passing game. I imagine that will go up a good amount after the Houston game, even with a pick or 2. Something tells me you're just not the real Brian Burke and just a Steelers fan mouthing off as usual

  12. Patrick says:

    Brian,
    Can you post the new coefficients?

  13. ubrab says:

    Brian,

    I'm surprised by the Cowboys O ranking - Watching only their last game, they were globally relying on big plays and had a lot of fumble luck which brings my question of how you handle fumbles (do you separate fumbled snaps from "normal" ones ?) - What else am I missing ?

    On another topic, someone from the comments use to take your game predictions and generate over/unders - Where is he gone ? Or, how can I reproduce that ?

  14. Brian Burke says:

    I'll try to answer these in reverse order.

    -I'm surprised by DAL too. Fumbles are all non-ST fumbles (not fumbles lost) per run+pass. Not sure how to reproduce over/unders. I think bobbled snaps are considered aborted plays rather than fumbles. Not sure about this though.

    -I'll be happy to post the coefficients. Don't have them here right now. If I forget bug me next week in the comments.

    -Yes, PIT int rate will regress upward. Accept it. They will not finish the season with zero ints.

    -Very funny, Mike. The secret adjustments for teams I don't like work slightly differently. You know your logit regression though!

    -PredictionTracker is a great site. I love that someone actually tracks all this stuff. However, because I don't predict against the spread, there's no place for me on PT.

    -I'm out of the business of publishing the track record of the model. I leave that to independent sources. Overall, I'm very pleased with the results. However, be aware that there are ways of making things look much better or worse than reality. How do you treat a tie? What if the model says .50/.50. Is that a loss for the model or is the game thrown out? If the game is down to the wire, isn't that a + for the model? Do you compare the model vs the closing Vegas line or the opening line? Do you compare it week vs week or overall?

    -I also noticed the teams are packed closer together this year. It's partly due to the method I chose to regress the team stats early in the season. The other part is due to way teams have played so far this year. Once there is more data, the stats will be regressed less, and depending on how teams play the top teams might be back up to .70 GWP.

  15. Dan Whitney says:

    Brian Thanks again for all your work here and for really describing how you derive your models.
    First I saw that D-int was not included in your original model and is now, I was wondering if there was a reason behind that.
    Second, I was wondering if you had any kind of analysis on how much of a difference the offensive and defensive line make for each team. I see that you have the WPA for each teams lines, and it seems like they have at some some correlation on winning. I was wondering if you had any equation that model the WPA of the lineman vs winning percentage or anything along those lines.
    Third, when looking at QB ratings I noticed that the EPA for your site was much higher than that used for ESPN'S QBR rating, presumably because they incorporate what percentage of a play's success is based upon the QB compared to other players. This means that your model values player with a lot of talent around them, like Phillip Rivers, and ESPN's model values those with less talent, like Matt Hasselbeck. I was wondering what your thoughts are on that.
    Thank, Dan

  16. Anonymous says:

    PredictionTracker does not track games against the spread. It tracks how well computer systems pick winners straight up. For reference it shows how often the Vegas favorite wins. Very few systems perform better than Vegas each year, and none has demonstrated a record of doing so consistently year after year. We should be skeptical until proven otherwise that your system is any different.

  17. Anonymous says:

    "Your intuitive estimates of team strength are far less accurate than you imagine. The thing is, you’ll forget how wrong you were by the end of the season, and re-wire your memory to trick yourself into believing you ‘knew it all along.’ We all do."

    Not all of us...I predicted Carolina to win the Super Bowl last year...I remember it very well.

    The year before I was feeling high and mighty because I predicted Minnesota would walk all over Dallas because the NFL East was crappy.

    A good honest memory helps to keep one humble.

  18. Anonymous says:

    Brain - Would it be post what the old method (using YPC versus SR) would produce for win probabilities? I'm very curious to see what kind of a difference there would be. Thanks so much for having this amazing site!

  19. Tom says:

    ubrab-
    I have posted in the comments section of the probabilities link an equation that can calculate the spread given the probability of a team winning, however I will post this week's here:

    Dallas over Detroit by 6.8
    Chicago over Carolina by 4.5
    Buffalo over Cincinnati by 0.3
    Tennessee over Cleveland by 2.4
    Kansas City over Minnesota by 1.4
    Washington over St. Louis by 1.7
    New Orleans over Jacksonville by 3.8
    Houston over Pittsburgh by 5.6
    Philadelphia over San Fran. by 6.4
    Arizona over Giants by 0.7
    Seattle over Atlanta by 3.1
    San Diego over Miami by 2.7
    Oakland over New England by 2
    Green Bay over Denver by 9.3
    Baltimore over Jets by 4.9
    Tampa Bay over Indianapolis by 3.1

  20. Gerrit D. says:

    ORUN% = 39%
    DRUN% = 59%
    39% + 59% = 98%, which does not equal 100%
    Does this mean that 2% of run plays are neither successful not unsuccessful?
    Or does this have something to with the way the averages are computed (each team is an equal 1/32 regardless of how often they run)?

  21. Anonymous says:

    I've long thought that per-play metrics should use quantile statistics (e.g. median) rather than the mean. How hard would it be to check if median yards per carry correlate better than mean yards per carry to winning?

  22. Brian Burke says:

    I also thought median was the key. But the problem with median YPC is that nearly every team has the same YPC...3.

  23. Brian Burke says:

    Garret-Very perceptive! The difference is due to how I originally coded SR. I rounded all plays to 2 digits. The 2% (or so) are plays in which the EPA was exactly 0.00. To count as a 'success' I said EPA had to be > 0, and not >= 0.

  24. Anonymous says:

    Have you ever considered restricting the data to certain subsets that might be more predictive?

    For example restricting the data to "normal" football. In blowouts teams may get into a run-run-pass on 3rd and long pattern that might skew both their run SR and their pass efficiency. Likewise on defense they may play a "prevent" that gives up a lot of underneath yards, hurting their defensive pass efficiency.

    Another idea might be to restrict the data to only include plays between the 20's. I don't know the numbers but I'm guessing that inside the red zone both pass efficiency and run SR decrease on offense and likewise increase on defense. Since teams often end up with different average starting field positions due to either non-predictive events (like turnovers or special teams) or predictive events (like the play of their own defense) some teams may spend a disproportionate amount of time playing on a shortened field and see their efficiencies skewed as a result.

  25. Brian Burke says:

    That's a good idea. I have done a little playing around with that but could do a lot more. I looked at only performance for qtrs 1-3 as a predictor, and it was slightly less effective as using all 4 qtrs.

    You don't want to get too tricky with restricting data. There's a trade off with sample size obviously. Plus, if you start under-weighting or excluding some data, you effectively over-weight other data. The last thing you'd want to do is over-weight clutch situations.

  26. Anonymous says:

    To score your model:

    Let p1,...,pn be the probabilities you placed on the outcome that in fact occurred.

    Your score is the average of the logs of p1...pn. This will be a negative number.

    To translate this into something that's easier to interpret (i.e. a likelihood), exponentiate this score to get back to a number between 0 and 1 (sort of a percentage accuracy).

  27. Anonymous says:

    Incidentally, by completely ignoring YPC aren't you discounting the value of home-run hitter RBs? I'm sure Oakland was happy that McFadden broke off a 60-yard run last Sunday. His YPC is higher than 14 team's YPA (with sacks in the denominator), so handing the ball off to him is practically the same as passing (so far).

  28. Brian Burke says:

    Sorry for the confusion. I know how to score a login model. The difficulty lies is that only about .1% of football fans would understand.

    And yes. That's exactly what I'm doing: discounting the the effect of hr-hitters.

  29. James M says:

    Sorry for the previous "test" message but I have been struggling to post on this forum for several days i finally gfigured out it is my computer nit accepting cookies form this site.

    Anyway my comment....


    I use Pointspread= 9*ln(odds) for college football and pointspread= 7*ln(odds) for NFL which reflects the higher scores in college. (This also works in Basketball)



    I use this as the basis for my ranking system which I won't plug on this site but which is based solely on scores and venue rather than game stats .

    So I would convert Oakland 56% NE 44% to 7*ln(56/44)= 1.68 points rather than Tom's 2.

    Tom out of interest which source did you use to generate the 8.5?

    One word of warning is that the graph of ln(odds) vs pointspread does go a bit kinked around 3 points particularly in the NFL reflecting the frequent occurence ofthis score but i cant find a simple way of ironing this out.

    Interestingly totals have very little effect on this relationship, whereas in theory a 3 point favourite in a game predicted to be high scoring should be less likely to win than a 3 point favourite in a low scoring game.

    Prediction tracker tracks both straight up record and against the closing Vegas line and he does include some systems which don't have pointspreads although it is unclear how he generates pointspreads for them.

  30. Anonymous says:

    how do you calculate predictions? never understood how.

Leave a Reply

Note: Only a member of this blog may post a comment.