Beating the Season Over-Under Follow-Up

Before the 2007 season began, I proposed a system that appeared to be able to systematically beat Las Vegas over-under lines on team wins. For the 2007 season, the results are in. The system's record would have been 8 correct, 3 incorrect, with 1 push (73% correct).

The system itself consists of two rules:

1. For teams predicted to win 9.5 games or more, bet on the under.
2. For teams predicted to win 6.5 games or less, bet on the over.

In other words, bet on mediocrity.

Historically, the system would have been 70% correct over the previous two years, and slightly over 58% correct over a 10 year period between 1996 and 2005. By betting 'over' on teams predicted to win 6 or less, instead of 6.5 or less, the overall rate would have improved to 61%.

Here is how the system fared this year. Over-under lines were taken from on 6/30/07.








BUF 6.57W

One thing I learned watching the results develop over the season is that the system is more successful the earlier in the year that the over-under lines are taken. Early in the year, long before training camps open, there is the least amount of information and the more likely it is that over-under lines are set based on the previous season's results. This is when uncertainty would be greatest, and perhaps when overconfidence is accordingly great. When looking for records of the over-under lines, I noticed that the earlier the line, the more confident it was--i.e. the further from 8 wins teams were predicted to win.

As player movements, retirements, injuries, team schedules, and pre-season games come into focus, over-under lines move to reflect the new information. Uncertainty is reduced, and the overconfidence would be mitigated. It might therefore better to place bets earlier rather than later in the pre-season, capitalizing on maximum uncertainty.

Why the System Works

The system is based on three principles:

1. The NFL season is extremely difficult to predict.
2. Regression to the mean is very strong in the NFL.
3. People are overconfident in their ability to predict team wins.

Humans, including NFL fans and gamblers, are susceptible to cognitive biases, and I believe these biases affect prediction markets. Cognitive (or heuristic) biases are systematic errors in judgment made in certain situations. These biases exist probably because they benefited early humans in their efforts to adapt and survive. For example, the "overconfidence effect" may bias people toward action rather than passivity in the face of a challenge.

But in a prediction situation, overconfidence is counterproductive. People who believe their prediction abilities are better than they truly are would take excessive risks. In one survey 80% of respondents claimed they were in the top 30% in driving skills. Similarly, it's likely that most gamblers also believe that they have some special ability or intuition to predict outcomes--otherwise they wouldn't be gambling.

Hindsight bias, also known as the "I knew it all along" effect, may also be a factor in fooling people into forgetting how poor their predictions really are. Ask yourself how well you thought New England would do this year. Be honest. According to the betting lines, half of all people thought they'd win 11 games or fewer. Remember how Randy Moss was supposed to be a "cancer?" How about the Bears? Half of bettors believed Chicago would win at least 11 games this year. But as the Bears floundered this year, many people forgot just how good they were expected to be--just on their defense alone. Most fans, including tv commentators, forget how uncertain things were before the first snap of the season. They say "it was obvious the Patriots would be unbeatable. They have Randy Moss!" or "obviously the Bears fell flat, Rex Grossman is terrible." I know that's my first instinct.

People also tend to remember correct predictions and forget incorrect ones--both by themselves and others. Today I noticed the cover page of an obscure football "prospectus" book I bought in August. It trumpeted, "Last year, we correctly predicted player X would have a breakout year! We said team Y would return to the playoffs!" That a book of over 300 pages of NFL predictions made four or five correct guesses is not an accomplishment. But it's the correct ones that are remembered.

Many people are also unable to grasp the concepts of randomness or luck. They discount regression to the mean because they expect extreme performances to continue. In the NFL, a team that finishes with a 14-2 record was probably a "fundamentally" 12-4 team that got lucky in a couple games. Fans and bettors may expect much of that 14-2 performance to carry over into the next season, thinking that 11 or 12 wins is a safe bet. But in reality, the team was a "12-win" team the previous season, so the chance of repeating such a successful year would be lower than expected. This effect is compounded when the same phenomenon affects a team's opponents. For example, part of the reason the Ravens and Bengals did not reach their over-under expectations is because division rival Cleveland improved from extremely poor performance.

This field of heuristics and decision-making fascinates me. I'm sure there is a lot of money to be made in not just betting markets, but equity markets. The one thing about sports though, is that it's relatively easy to analyze statistically. And no. I'm sorry to report I did not put my money where my mouth is.

'Dome at Cold'

This week features two of the three "dome team at cold weather" games of the 2007 season. Earlier, CIN defeated STL in week 14. Today CHI hosts NO and GB hosts DET.

Dome teams are at a severe disadvantage when playing in cold weather. Over the past five regular seasons, they've won only about 14% of the time in those situations. Accounting for relative team strength, they would be expected to win only 12% of the time.

GB is already a heavy favorite over DET, and the weather factor would only enhance GB's expected chance of winning. But the game prediction model features closer odds for the NO at CHI game. NO is favored with a 0.57 probability of winning.

However, when the weather is factored in CHI is going to be favored. Replacing the standard home field advantage coefficient with the "dome at cold" coefficient we get a 0.62 win probability for CHI.

With the weather factor, GB becomes a heavier favorite at 0.92.

Some people are already looking ahead to the likely AFC championship match-up between IND and NE. Although IND played NE very well, and nearly won the game earlier in the year without five starters, their next game will be very different. It will be in Foxboro, Mass. in late January. It was no fluke that the one year IND was able to get past NE to get to the Super Bowl was the year they hosted the game.

AFC Wildcard, Resting Starters, and 16-0 Teams

A few weeks ago, I wrote that the 2nd AFC wildcard playoff spot come down to the last week of the season between TEN and CLE. Their relative strengths of schedule clearly favored CLE, but the difference in opponent strength was primarily to due their final opponents of the regular season. CLE faces a weak SF while TEN faces an elite IND.

But with IND locked into the second seed, IND may not be so elite in week 17. I can't begrudge a team for resting its best players prior to the playoffs. It's legal and in their self-interest. But it is a shame that who gets into the playoffs can be determined by such a quirk.

If TEN played IND in any other week, they wouldn't have much of a chance to win. But against Jim Sorgi, they would probably be favored. IF TEN loses to a full-strength IND, Cleveland wouldn't even need to beat SF. They would keep the tiebreaker due to having a better conference record.

On a similar note, I estimated that NE had about a 52% chance of finishing the regular season undefeated. And it's no surprise that here are the unbeaten Patriots facing the Giants tonight for a chance at NFL immortality. It's also a shame that the Pats may not be facing an opponent at full-strength, as the Giants have clinched the 5th seed.

These unfortunate circumstances are really just due scheduling quirks. Had NE's or TEN's schedule been arranged any differently, their prospects for victory this weekend would be very different.

How rare is a 16-0 team?

Here is a quick back-of-the-envelope analysis. Let's say that every year there are two legitimate 13-3 teams. In other words, there are two teams in the NFL that have a fundamental .813 winning percentage against an average strength of schedule without luck of any kind. In a 16-game season, a team with an underlying .813 win probability would have a 2.8% chance of winning all 16 games. (0.813 ^ 16 = 0.0281).

Because we assumed there are 2 such teams each year, the chance of neither team going undefeated is (1-0.028)^2 = 0.944, or 94.4%. The chance of one or both teams being undefeated is therefore 1-0.944 = 5.6%. We should expect an undefeated team about every 1 out of 20 years with the current state of talent distribution and a 16 game schedule.

I'm not saying the 2007 Patriots are really a 13-3 team that's been really lucky. Assuming they beat the Giants, my hunch is (and their stats say) they are really a 15-1 team that dodged a bullet or two (namely the Ravens and Eagles).

Week 16 Efficiency Rankings

NFL team efficiency rankings are listed below in terms of generic winning probability. The GWP is the probability a team would beat the league average team at a neutral site. Each team's opponent's average GWP is also listed, which can be considered to-date strength of schedule. GWP modifies the generic win probability to reflect the strength of past opponents. Offensive ranking (O Rank) is based on each team's offensive GWP, i.e. it's the team's GWP assuming it had a league-average defense. D Rank is vice-versa. Rankings are based on a logistic regression model applied to data through week 16. A full explanation of the methodology can be found here.

Click on the table headers to sort.

RankTeamLast WkGWPOpp GWPO RankD Rank

Game Predictions Week 17

Game probabilities for week 17 NFL games are listed below. The probabilities are based on an efficiency win model explained here and here. The model considers offensive and defensive efficiency stats including running, passing, sacks, turnover rates, and penalty rates. Team stats are adjusted for previous opponent strength.

These probabilities may not reflect that several teams are locked into their playoff seeds and possibly resting many of their starters. However, I will post these without regard to this consideration and allow everyone to apply their own correction. Teams that are particularly likely to not play at full strength are noted (*). My own sense is that the only games in which the favorite would now be the underdog are the TEN at IND and DAL at WAS games.


Week 15 QB Ratings Adjusted for Pass Defense

In this post I'll recap how my QB rating is determined, and I'll add a new dimension--adjustments for opposing pass defenses.

QB Rating Recap

The QB rating I post often is different from most, including the NFL passer rating. The stat I use to compare QB performance is +WP16 or wins added per 16 games. +WP16 estimates the number of wins above average a quarterback's performance would mean to his team over a full regular season.

The +WP16 stat estimates the wins added based on a regression that analyzes what makes teams win. The QB stats included are passing yards per attempt, rushing yards per attempt, sack yards per attempt, fumble rates, and interception rates. +WP16 isolates QB performance by assuming an average rushing game and an average defense.

First downs and touchdowns are not included despite their apparent importance because they are intermediate outcomes between actual QB performance and the outcome of interest--team wins.

Perhaps the most unique aspect of +WP16 is that it does not include passing yards gained by receivers after the catch (YAC). Instead it is based on air yards--the yards the ball travels through the air forward of the line of scrimmage. The percent of passing yards made up of YAC varies widely from QB to QB. Brodie Croyle gets 68% of "his" yards from YAC while Ben Roethlisberger gets only 34% of his yards from YAC.

There is some debate over whether or how much YAC should be credited to QBs. In my mind, the answer is much closer to none than to all.

There is statistical evidence that YAC belongs to the receiver. Year-to-year correlations of YAC are far stronger for receivers than for QBs. Accuracy rates and QBs' abilities to read the defense and "see the field" (measured by interception rates) do not contribute to YAC. It appears that YAC is determined by receiver ability far more than QB ability.

The types of passes thrown by a QB also help determine YAC. Screen passes and check downs to RBs, which are not difficult passes, tend to produce more YAC relative to air yards. Deep out passes, which are more difficult and require more QB skill, tend to generate more air yards and often no YAC at all. The other type of pass that can create a lot of YAC is when a WR beats the defense deep by simply out-running them. Although it certainly helps if a QB can get the ball to a receiver who has gotten behind his coverage, those types of plays are rare and are really made possible by the WR, at least in my mind.

For the reasons above, I believe the truer measure of a QB's performance is one that does not include YAC. Here is the ranking of QBs based on performance so far this season:

RankNameQBRatAttYdsIntRushYdsSk YdsFumAY/A%YAC+WP16
3Manning P95.246436341419-412254.9371.29
24Manning E72.6482297417245618673.543-1.11
RankNameQBRatAttYdsIntRushYdsSk YdsFumAY/A%YAC+WP16

The biggest surprise at the top of the list is David Garrard. He's ranked so high primarily because of his phenomenally low turnover rate, having thrown only three interceptions all year. His air yards per attempt (AY/A) is impressive too. He ranks among the elite QBs in the league with 4.8 yds/att. In total, Garrard ranks second behind Brady with +2.21 wins added.

Adjusting for Opponents' Pass Defense

When comparing team performance, opponent strength matters a great deal. Schedules in the NFL are not balanced. During the regular season a team only plays 13 of the 31 other teams in the league, 3 of them twice. If those 3 teams a team's own division are among the toughest, and the 18 teams it never plays comprise some of the weakest opponents, then the team would appear to be much worse than it would be if it had an average schedule.

Early in the season opponent adjustments are most important, because as the season goes on and each team faces more different types of opponents, each team's strength of schedule tends to even out. But not completely.

It stands to reason that QB performance ratings would also be affected by opponent strength, specifically opponent pass defense. To confirm this effect, each team's strength of pass defense was computed. That includes average pass yards per attempt allowed, sack yards per pass play, and interception rate. Each stat was weighted according to how important they are towards winning based on the same regression model that determine the weights of the QB rating. This produced a unit-less pass defense score.

Then each QB's average opponent pass defense strength was determined. QBs were counted as having played a team if they had more than 10 attempts in a game.

To see how opposing pass defenses affect QB performance, the correlation coefficient between each QB's rating and his opponents' average pass defense strength was calculated. The correlation was 0.098, which is weak but what we'd expect. About 9% of a QBs season-long performance can be attributed to his opponent's pass defense. Or strictly speaking, one standard deviation (SD) increase in opposition strength results in one SD reduction in QB performance.

The last step in the process was to adjust the QB rating (+WP16) accordingly. Each QB's rating was adjusted by 0.098 SDs for each SD of opponent strength. The stronger the average opponents' pass defense, the higher the adjustment for the QB.

The table below lists each QB's unadjusted rating, his average opponent's pass defense strength, and the adjusted rating. To make the numbers a little easier to understand, pass defense strength is converted into GWP (generic win probability), or the probability a pass defense would win a game against a league-average opponent given an average run defense and offense.

RankName+WP16Opp Pass D StrAdj +WP16
3Manning P1.290.531.43
27Manning E-1.110.50-1.12

David Garrard

At this point in the season (going into week 16), we wouldn't expect there to be much change in the ratings by accounting for opponent strength. Most teams have faced some good pass defenses and some weak ones. And indeed there is relatively little change in the rankings.

There are some notable exceptions, however. The QB who faced the weakest pass defenses was Steve McNair who played against CIN, ARI, CLE, and SF, all of which had below average pass defenses. McNair still managed to turn in a miserable performance in 2007 due to age, injury, or probably both.

The other notable exception is David Garrard. Garrard faced the toughest pass defenses of any QB in 2007. If not for an injury that kept him out of three games against TB, NO, and TEN in weeks 8,9, and 10, he would have an even tougher year. Despite his uphill battle, Garrard is playing extremely well this season. Before adjusting for opponent strength he ranked second among all QBs (not even counting for his week 17 demolition of OAK.)

In the year of Tom Brady and the Patriots, it may be David Garrard who is the best QB this season. After factoring in opponent strength, his +WP16 rating jumps from +2.2 to +2.5, edging Brady for the top spot. That's remarkable considering Garrard wasn't even his team's starter in training camp. Jaguars coach, Jack Del Rio, looks like a genius for cutting Leftwhich and sticking with Garrard.

Playoff Race Predictions Wk 15

Don't forget to check out for an update on playoff race probabilities.

Luckiest Teams through Week 15

Based on opponent-adjusted generic win probability (GWP), the number of expected wins can be estimated for each team. Teams that have won more games than expected can be considered lucky, while teams with fewer wins than expected can be considered unlucky.

The list of NFL teams sorted from luckiest (positive numbers) to unluckiest is posted below. We would expect most teams to be within +/- 1.0 wins. So teams outside that margin can be deemed significantly lucky or unlucky.

TeamGWPWinsExp WLuck

Game Predictions Week 16

Game probabilities for week 16 NFL games are listed below. The probabilities are based on an efficiency win model explained here and here. The model considers offensive and defensive efficiency stats including running, passing, sacks, turnover rates, and penalty rates. Team stats are adjusted for previous opponent strength.


Week 15 Efficiency Rankings

NFL team efficiency rankings are listed below in terms of generic winning probability. The GWP is the probability a team would beat the league average team at a neutral site. Each team's opponent's average GWP is also listed, which can be considered to-date strength of schedule. GWP modifies the generic win probability to reflect the strength of past opponents. Offensive ranking (O Rank) is based on each team's offensive GWP, i.e. it's the team's GWP assuming it had a league-average defense. D Rank is vice-versa. Rankings are based on a logistic regression model applied to data through week 15. A full explanation of the methodology can be found here.

RankTeamLast WkGWPOpp GWPO RankD Rank

The Best Defensive Player in the NFL is...Neil Rackers?

My previous look at place kickers considered field goal kicking. Accounting for attempt distances and home field environment (dome, warm, etc.), I estimated that one standard deviation in FG kicking accuracy results in about 2.3 more successful FGs per season (worth 6.7 points). In this article, I'll look at kick offs and how they affect the ability of the opposing team to score.

Starting field position is obviously an important factor in scoring. The closer an offense is to scoring position, the fewer consecutive first downs are required to get there. A kicker who can kick deep can help his team by giving his defense more opportunities to force a punt or a turnover before his opponent is able to move into scoring position. Further, if the opposing offense is stopped, his offense will receive possession of the ball that much closer to scoring position on its own.

Kick off depth is easily measured, but the resulting field position is dependent on kick return and kick coverage performance. To isolate the performance of the kicker from the rest of the kick off squad, I ran a quick linear regression that estimates the expected resulting starting field position based on the depth of the kick. Average kick distances and return yards for each kicker (with >20 kick offs in a season) in the 2004-2006 seasons are plotted below. (Kicks that result in touchbacks are excluded at this point because there is no return. They will be factored back in later.)

We see that the deeper the kick, the longer the return. The longer a kick travels the more time and space the returner has to run before being met by the coverage team. Not every yard of extra kick distance translates into field position. About half of each marginal yard of kick distance is given back up in the return. If one kicker kicks 60 yd kicks, and another kicker kicks 70 yd kicks, the second kicker will only benefit by 5 yds of resulting field position after the average return.

Now we can calculate the expected resulting field position for each kicker in the NFL. Touchbacks are now factored back in. The expected starting field position of each kicker over the previous three seasons ranges from the 23.7 yd line to the 31.4 yd line. The average is the 27.4 yd line with a standard deviation of 1.49 yds.

Click on the table headers to sort.

KickerSeasonsKOsAvg DistTB PctExp Fld Pos
Neil Rackers320667.930.623.7
Michael Koenen214764.218.424.8
Paul Ernster17567.925.325.0
Olindo Mare318465.727.625.0
Josh Scobee322565.422.725.6
Kris Brown320164.414.925.8
Stephen Gostkowski18165.514.825.9
Todd Sauerbrun210060.011.326.2
Joe Nedney213260.28.626.3
Sebastian Janikowski319363.615.426.3
Jason Hanson320464.115.226.4
Micah Knorr16164.424.626.4
Jeff Wilkins323363.48.426.5
Wade Richey15363.113.226.5
Rob Bironas214163.214.926.5
Jay Feely324963.915.126.5
Mitch Berger28965.15.226.8
Phil Dawson319161.410.026.9
Aaron Elling27063.52.926.9
John Carney29561.64.727.0
Shaun Suisham12664.10.027.1
Billy Cundiff312263.16.227.1
Jose Cortez13863.910.527.3
Paul Edinger212859.10.027.4
Jason Baker12463.04.227.4
Adam Vinatieri324763.510.927.4
David Akers320464.312.027.5
Rian Lindell322159.74.627.5
Matt Bryant211962.57.627.6
Craig Hentrich17357.64.127.6
Josh Brown324663.18.927.6
John Kasay319462.15.428.1
Jeff Reed324461.66.528.2
Dave Rayner215463.38.328.2
Nate Kaeding327062.65.628.3
Lawrence Tynes326061.25.428.4
Robbie Gould214963.36.828.5
Matt Stover312861.36.728.6
Shayne Graham325761.56.728.6
Ryan Longwell321859.63.728.7
Toby Gowin17663.09.228.8
Todd Peterson16156.41.629.0
Mike Nugent213760.02.229.1
John Hall28160.72.729.3
Martin Gramatica24660.94.429.3
Jay Taylor12157.00.029.4
Steve Christie17157.72.829.7
Mike Vanderjagt27859.03.330.0
Nick Novak24458.82.131.4

The Importance of Field Position

I think the best way to examine the football-significance of kickoff field position is to look at two contrasting cases. Let's compare the league's best kicker, Neil Rackers, with one of its recent "worst," Mike Vanderjagt, who was cut by the Cowboys last year. Racker's average expected field position was the 23.7 yd line and Vanderjagt's was the 30.0 yd line, a difference of 6.3 yds.

How important are those 6 yards? The field is 100 yds long, so are they 6% important? It's difficult to quantify, but here is one way to think of field position: In almost all cases, to move into scoring position, either for a TD or FG, an offense needs a consecutive number of first downs. The closer an offense is to scoring position, the fewer first downs it needs. In other words, lets think of a typical drive as a sequence of 1st downs instead of a sequence of plays.

In baseball, the game rests on the outcomes of at bats, not necessarily every ball and strike. Think of a series of downs as the equivalent of an at bat, and individual plays as pitches. We're not as interested in each pitch as we are with the outcome of the at bat. The analogy continues because to score in baseball, several consecutive successes (hits or walks) are usually required. In football, several first downs are usually needed to score. (And in both sports there is always the slim possibility of a home run/long pass or breakaway run.)

In the NFL, a successful first down yields 15 yards on average. So those 6 yards of field position mean that in 6 out of 15 cases, an opposing offense needs one additional first down more than they otherwise would need to score. That additional first down gives a defense one more opportunity to force a kick or a turnover. That additional first down could turn what would be a TD drive into a FG drive, or turn a FG drive into a scoreless one.

In any given series of 4 downs in the NFL, an offense succeeds in gaining another first down (or touchdown) 65% of the time. So defenses are able to interrupt the consecutive series of first downs 35% of the time. So 6 yds of deeper field position per kickoff reduces the probability the the offense will score.

(6 / 15) * 0.35 = 0.14 --> 6 yds of field position adds a 0.14 probability of interrupting a scoring drive.

(It was pointed out to me that series in the red zone are often more difficult because of the compression of the field, so not every series has an equal probability of success. That's true, however the red zone series happen on a TD drive whether or not the drive started on the 20 or the 40 yd line. The "extra series" that is sometimes required by a deep kickoff does not cause an additional red zone series. It would be on the front end of a drive and not in the red zone.)

Over the previous five years, NFL teams scored an average of 23.4 FGs and 34.6 offensive TDs per season, resulting in 312 pts per season. A 14% reduction equates to 43.7 point difference between Neil Rackers and Mike Vanderjagt.

312 * 0.14 = 43.7 pts per season

I know my favorite team sure could use an extra 43.7 points this season. Depending on how they are distributed, that would usually mean an extra win or two. However, only 37% of drives begin immediately following a kickoff. That would mean that the difference isn't really 43.7 pts per season, but:

0.37 * 43.7 = 16.1 pts per season

But, in a way, all drives originate from a kickoff. All drives' starting field position subsequent to a kickoff are biased by the kickoff result. For example, a poor kick that results in a starting field position at the 40 yd line will result in all subsequent drives being closer to the kicking team's end zone than otherwise, until another kickoff. For that reason, I think the real answer is a lot closer to 43.7 points per season than 16.1 points per season.

Additionally, an extra series on a drive means there are more plays that could result in a turnover. About 10% of all series result in either a fumble or interception. That further strengthens the effect of field position and the importance of deep kickoffs.

We began the analysis by comparing Rackers and Vanderjagt, who were extreme examples--about 4 standard deviations apart. If we repeat the above analysis by looking just a single standard deviation in kickoff distance (1.49 yds), the effect is 10.9 pts per season per SD.

My previous look at FG kicking estimated that every additional SD in accuracy resulted in 6.7 more pts per season. When compared to 10.9 pts per kickoff SD, it appears that kickoff performance is the more important aspect of place kicking. Interestingly, when Vanderjagt was cut in 2006 by the Cowboys, the news articles didn't bother to mention his short kickoffs, only that he didn't like to do it.

An Alternate Analysis of Field Position

Previous studies have examined the point value of various field positions. Perhaps the best has been David Romer's study regarding 4th down situations. To evaluate when coaches should "go for it" on 4th down, he evaluated the point value of a 1st and 10 from each position on the field. The figure below is from his study.

Between the 15 yard lines, the value change is linear--0.04 points per yard. A 6 yard difference, therefore, equates to 0.24 pts per drive. With an average of 182.6 drives per season, that's a difference of 43.8 pts per season--almost exactly the same result yielded by the "one additional first down" analysis.