Game probabilities for week 5 are listed below. The probabilities are based on an efficiency win model explained here and here. The model considers offensive and defensive efficiency stats including running, passing, sacks, turnover rates, and penalty rates. Team stats are adjusted for previous opponent strength.
V Prob | VISITOR | HOME | H Prob |
0.85 | ARI | STL | 0.15 |
0.13 | ATL | TEN | 0.87 |
0.62 | CAR | NO | 0.38 |
0.04 | CLE | NE | 0.96 |
0.22 | DET | WAS | 0.78 |
0.73 | JAX | KC | 0.27 |
0.33 | MIA | HOU | 0.67 |
0.10 | NYJ | NYG | 0.90 |
0.69 | SS | PIT | 0.31 |
0.19 | TB | IND | 0.81 |
0.49 | BAL | SF | 0.51 |
0.03 | SD | DEN | 0.97 |
0.09 | CHI | GB | 0.91 |
0.90 | DAL | BUF | 0.10 |
Your system really, really, really loves the Colts.
Actually, looking at other predictions, I think it's just too bullish on teams that have performed well so far, and too pessimistic on teams that have performed badly.
I really don't understand why your system loves the Broncos so much. They lost to two teams that your system likes (Colts & Jaguars) and barely beat two teams that your system hates (Bills & Raiders). If you beat the Raiders by a field goal in overtime at home you shouldn't be considered a top 5 team (I should clarify that I'm a Raiders fan and I'm very happy with their improvement but they are still rebuilding and I'll be happy with a 6-10 or 7-9).
The season is still young, so as a couple more weeks go by, outlier performances won't have the same impact on team ratings that they do now. Alan makes a great point that the model is overconfident right now, and I'm pretty sure the young season is probably why.
But I think the bigger risk of error is not in the overrealiance on actual performance, but on overreliance on pre-season notions of how good a team "should" be. Last week is probably a good example. Only 4 "consensus favorites" won out of 14 games.
Denver sticks out like a sore thumb to me too. Here's really why they are rated so highly here: The average pass efficiency (yds/drop back, incl. sack yds) is 6.3 yds. Denver gets 7.5 yds on offense and only gives up 4.86 yds on defense--including their game vs the Colts. That puts them 4th in the league in both categories. They've also had a slightly tougher than average schedule so far.
Hmmmm, Denver's pass defense is good but that is probably skewed by the extreme suckitude of the Raiders passing game that day (with McCown playing on a sprained ankle & a broken finger on his throwing hand). But Denver's run defense has been pretty mediocre. Everyone has put up over 120 yards rushing on them so they haven't had to test the passing game as much.
I don't believe that Denver will end up with 12 wins.
BTW I've been a regular reader on FO and I really appreciate a different perspective on the use of statistics in football. Thanks for doing all of this work.
You're welcome. Yes, but 75% of the Raiders' suckitude is accounted for in the equations.
I agree, 12 games does seem high for DEN, but I'd still feel ok making them the favorite in the AFC West right now. We'll see.
Their run defense is their weakest link--5 yds/rush. But their rushing offense is also 5 yds/rush, so it's a wash--actually better than a wash because (for some reason) the coefficient for run offense is stronger than for run defense.
"Stopping the run" is terribly overrated. Just ask the 2006 Colts and Saints.
I have a very hard time reconciling a 96% win probability against your earlier assertion that game results are 52.5% luck. I'm interested in your thoughts on this. Would it be fair to say that "past data suggests a 96% chance that NE will outplay CLE on Sunday, which would guarantee a win if luck played no part"?
Tarr-It's not that every game is 52% luck, it's that in 52% of games, luck is the deciding factor and not the relative skill of either team.
A mismatch-type game would not likely be one of those 52% of "luck" games. A 96% vs. 4% mismatch game would result in an upset 4% of the time.
Also, keep in mind that the "better" team can win by luck too. So, we'd expect favorites to win about 77% of the time on average. The trick is knowing who the "true" favorite is rather than the "perceived" favorite.
In other words, a perfect prediction system could only be expected to be correct about 77% of the time.
Using the same model as last week to assign your probabilities a Vegas-style line, here are your best bets (your pick vs. their opponent, vegas line relative to your team, your "line" relative to your team).
DEN vs. SD, 0, -10.2
SEA vs. PIT, +6, -3.5
confiedence gap
CLE vs. NE, +16.5, +9.5
CAR vs. NO, +3.5, -2.1
GB vs. CHI, -3.5, -9.0
NYG vs. NYJ, -3.5, -8.8
the rest have a line differences of less than 4 points
I agree that your system puts a lot of faith in current season performance, but that's likely the glory of it.
Best Bets were 2-4, with only the NYG and CAR pulling through. Although, the Browns were within a 1/2 point of the Vegas spread I found.
Sky-In general, be very cautious about applying statistical models to point spreads. The scoring system in football is highly irregular--intervals of 3 and 7 mostly. The distribution is not continuous or normal in any way. Plus, there's always the trash scoring when a team with a big lead plays soft to keep the ball inbounds and the clock moving.
The only thing I'm really embarrassed by is the trahsing the Chargers gave the Broncos. I had the Broncos a sure thing at 97 to 3! It is a very rare game anytime one team destroys another like that on their home field. But then again, 1/33 games like that would be expected to end up that way. It's a weasel excuse, I know.
Overall this Sunday, the Vegas lines were 10/12, and the efficiency model was 10/13 (SD at DEN was a pick'em).
But that's exactly my point, BB. You said that 97% game is only had a 3% chance of being an upset, but I think this fails to account for:
1) The randomness of one individual football game, and
2) The level of confidence you should have in your model after 4 weeks of data.
I think that, if you could properly account for this, you would have had no more than one or two games this week with a predicted win probability over 80%, as oppose to the eight you had.
Let me put that another, less heuristic way. Let's say you formulated an error metric for each game, which was:
((A win?) - (predicted A win %))^2
... summed over all games for that week. In this metric, (A win?) is either 1 or 0.
My contention is that your current model is "overconfident" about it's favorites, and would lead to a much higher MMSE than you would get by a less confident predictions.
A good start for checking this would be to take your predicted probabilities for each team, cut them in half, and add 25%. So the theoretical maximum would be 75% confidence. My guess is that this would outperform the current model in terms of my MMSE metric. I am not implying that this would be the best fix, but if I am right, this would strongly suggest that your model is overstating the projected probabilities of the favorites.
Tarr-I don't disagree with your observation about overconfidence. And my hunch would be the 75% ceiling would indeed beat the pure model in MMSE. But I'm not sure about forcing a theoretical maximum on the prediction confidence is the best solution.
Keep in mind, there are games between particular teams that we can be more than 75% confident about--Colts vs. Dolphins maybe. Also remember that the 75% theoretical maximum includes a lot of 55/45 games or even 52/48 games.
The current model basically says this:
1. Assumes SD's and DEN's to-date efficiency stats represent their true full-season talent,skill, and performance.
2. If a team with season-long stats such as SD's plays at a team with season-long stats such as DEN's, the home team with a stat advantage that enormous would win 97 times out of 100 games.
I think #1 is the problem, but I think #2 is true. So instead of tinkering with #2 to compensate, I'd rather fix #1.
I think the overconfidence in the model at this point in the season comes from only 4 data points for each statistic so far. Outlier/unrepresentative performances have a large effect on each team's aggregate efficiency stats right now. In other words, Denver may have played their best 4 games in terms of team efficiencies already, and from here out their stats will regress to the mean. Until their efficiency stats stabilize, the model would overweight Denver's odds of winning. Vice versa for San Diego.
[Incidentally, this would be true of all statistical models this early in the season. But I think a pure efficiency model might be less susceptable. For example, including 3rd down conversion rates or red zone rates would exaggerate the effect of unrepresentative performances to an even greater degree.]
Ultimately, it's the unstable input variables which don't represent each team's true ultimate efficiency averages, not the logit model's output formula that is causing the overconfidence. So I'd rather attack the problem from the input side than the output side (like assigning a confidence ceiling).
I've already been working on a simple method of Bayesian compensation for early-season unstable team stats. It's similar to what IMDB or other similar sites do with their movie-rating system. IMDB doesn't want one voter to give 5 stars to his favorite Pauly Shore movie, resulting in a perfect 5.00 rating for "Encino Man" until someone else takes time to rate it a zero. So they assign every movie a certain number of baseline votes, so one early voter doesn't move the aggregate too far. Eventually, as the real votes accumulate, the baseline phantom votes are pushed out of the calculation. Throughout the process, the aggregate rating is stable.
I'm considering doing something similar here. I'd average in a number of "phantom" games at league-average stats for every team. This would mitigate the exaggerated confidence levels early in the season. (DEN's stats wouldn't look so good, and SD's wouldn't look so bad.)
The question is, how many pure-average phantom games should I use? Patrick's comments in the "Model Coefficient" posts pointed out something important. I need to look at how quickly each team's stats stabilize. In other words, how many games of data are required before they are within an acceptable margin of their final season average? (...which is still not perflectly representative of a team's true inherent ability.)
Or maybe I should include the previous year's stats in the baseline?
Tarr-I read your post too quickly. I see what you mean now--just use the 75% ceiling as a check on overconfidence. My apologies.
BB - no problem. I agree that #1 is the big problem. I like the approach you are taking to address it.
I don't have a strong sense of how many phantom data points to use, or whether to use past year's stats. My first instinct would be to look to what FO has done. They use their projections (which are largely last year's stats, with a few adjustments for personnel changes), with gradually decreasing weights, for something like 6 or 7 weeks before they fall to zero. I'm pretty sure they arrived at this by running sims on past years and seeing when using data from the year before no longer improved the predictive power of the model.
Brian, first, great work on the site.
I agree that it would have been nice to know that Denver did NOT, in fact, have a 97% chance of winning that game. Really, anything below the 80% mark would have been acceptable, but 97%, sheesh...there's got to be some flaw in the model for that to happen (or maybe not a flaw but rather *something* missing from the input that would have shot up a red flag...any thoughts, in hindsight, what stat or ratio might have predicted the Chargers utterly dominating Denver (in Denver)).
Also, it would help me, b/c I have to do it manually now and I'm sure others do the same, if you would do a more detailed Evaluation and Analysis of your weekly predictions.
Moreover, it would be interesting to see (like footballpredictionnetwork does) your logit model re-run on the most-recent week's matchups using (1) ONLY that game's data ...to see how your model would have predicted the game given the actual data from the game and (2) using the UPDATED input through this past week's games, re-run the model on all previous week's matchups, to see how the *latest-and-greatest* model and data would have predicted the outcome. Hope that's clear. Again, love the site.
Brian, it was your work that showed NFL games are determined by luck a certain percentage of the time, right? If so, shouldn't that be built into the predictions, thus making 97% impossible.
Of course, a 97% favorite will lose sometimes -- we'd expect about seven of them per season, one almost every other week.
Andy-Hi and thanks. See my comment above in response to the DEN SD issue. It's not so much a flaw in the model, it's in the assumptions that are required. In fact, I'd bet that a team with SD's to-date stats would indeed lose 97/100 games played at the homefield of a team with DEN's to-date stats. The weakness is in assuming each team's to-date stats are representative of their true central tendency. My comment above also addresses how I plan to compensate for early season instability.
Also, I like your suggestions. It's mostly a matter of time, though. I might pick a game or two a week to disect, but I can't do them all. Perhaps tonight I can do an autopsy of the SD at DEN that will help explain how the model works.
Keep the faith. The model is still predicting games at about a 75% rate and is doing a couple games better than consensus Vegas favorites so far. There will be great weeks and bad weeks. There will be botched games too. But overall, I expect the model will outperform the consensus.
Sky-The luck study I did showed that luck is the deciding factor in about half of all NFL games. (DAL at BUF last night for example). Sometimes the better team wins by luck, so the maximum theoretical prediction rate is about 75%.
However, that does not mean that some games can't be predicted at over 75% confidence (MIA at IND or ATL at NE, for example).
95/5 type games are predicted incorrectly 3% of the time. 60/40 type games are incorrectly predicted 40% of the time. Since there are more 60/40 tape games than 95/5 type games, it averages out to be about a 25% incorrect rate. Again, this does not mean that we can't be less than 75% confident in any *one game*.
It's true that a team with only a 3% chance of winning will win about once out of 33 games, so the fact that SD won should not surprise us too much. The surprise is that they dominated DEN completely.
Another way to limit overconfidence would be to average early season efficiency to the league average. Might be easier than using a pre-season projection, which as you've mentioned before is often itself flawed.
I think the problem with the Chargers-Denver game was the Chargers 15.0 net yards per pass attempt.
That was amazing. I don't think your model, or any model, could have predicted that.
Add to that the Chargers getting 5.8 yards per run, a very good figure, plus a fumble ret for a td, and Denver just had no chance.
this may be one of those really explosive outlier games.
I am certain the Chargers will not have a performance like that again.
getting over 10 yards per pass is very hard, even for one game.