Super Bowl XLIII Prediction

Win probabilities for Super Bowl XLIII are listed below. More info after the jump.

PwinSuper Bowl XLIII
0.69 PIT at ARI 0.31

The probabilities are based on an efficiency win model explained here and here with some modifications. The model considers offensive and defensive efficiency stats including running, passing, sacks, turnover rates, and penalty rates. Team stats are adjusted for previous opponent strength.

The probabilities based on full regular season statistics would be 0.74 to 0.26 in favor of Pittsburgh. The efficiency stats of each opponent is listed in the table below.

I was going to do a write up of each facet of the match-up, but I think the table says a thousand words. The one thing I will point out however, is Pittsburgh's pass defense. It gives up only 4.3 yards per drop back, and almost 4% of all passes are intercepted. Since the 2002 season, the next best pass defense was the '02 Buccaneers who gave up 4.5 yds per attempt. Baltimore's '03 defense was third giving up 4.8 yds per attempt. That's three standard deviations better than the mean. In comparison, Arizona's passing offense is 1.4 standard deviations better than the mean. And that's why the Steelers are a strong favorite.

NFL Avg6.

Weekly Roundup 1/29/09

The guys at Cold Hard Football Facts think they've found evidence vindicating the 'frequent running causes winning' fallacy. But I don't think they did at all. Yes, teams that run more often in the Super Bowl (and all other games) also tend to be the teams that win the game. But we all know that it's the lead that allows all the rushing. Take the 2000 Ravens-Giants Super Bowl. The Ravens ran 33 times compared to 15 for the Giants. But 40% of Baltimore's runs were in the 4th quarter after they already had a 24-7 lead. Oddly, the article address the correlation-causation fallacy, but then just says "it's up for debate."

Football Outsiders looks at all the silly prop bets available for the Super Bowl. My favorite is the one about Matt Millen picking the winner in the pre-game show. I have to think that there is so much negativity surrounding that guy that there is an arbitrage opportunity there. (By the way, I noticed FO has been banned from Google. They must have been caught gaming the search algorithms.)

Speaking of silly betting, here is PFR's Super Bowl squares post. And if you're playing SB squares, you'll probably want to keep an eye on the win probability site. The probability of a current drive ending in a TD or FG is available in real-time.

PFR also responds to an article from the Community site asking whether all 10-point leads are equal. Basically, the question is does a 30-20 lead have the same win probability as a 13-3 lead? The answer is no, they're not exactly the same. The 13-3 lead is slightly safer. But it really depends on home field advantage and relative team strength more than whether it's a 30-20 or 13-3 type lead. Pretty interesting, and this kind of stuff has direct applications for the win probability engine.

I wonder who the Derek Jeters of football are?

The Patriots are already 6 to 1 favorites to win the Super Bowl next year.

The Numbers Guy looks at why QBs almost always get named the Super Bowl MVP. A bunch of football stat-heads, including myself, toss in their 2 cents. The author wanted to know if there was a statistical way to isolate the contribution of a single player in a game. My idea was actually "n-player cooperative stability equilibria," but thankfully it was translated into "a market-based approach."

An article from about the overtime rules likes the "field position auction" idea. Phil Birnbaum agrees. I think ideas like those are clever and effective solutions, but the NFL is unlikely to make changes that veer too far from tradition.

The problem with overtime isn't really the coin flip, it's the incredible range and accuracy of modern kickers. The entire sport has been slowly warped into fieldgoal-ball. Overtime is just where the problem becomes most obvious. I'd suggest solving the issue by 1) narrowing the goal posts, and 2) moving the kickoff line back to the 40 for overtime.

Shameless Plug for Win Probability Site

Amaze your friends. Wow your family. Dazzle your co-workers. Confuse your brother-in-law.

Don't forget to fire up the Super Bowl XLIII in-game win probabilities this Sunday. Them: "Oh man, they'll never come back from that lead!" You: "Actually, they have a 15% chance of coming back" Them: "Shut up, you smart ass. Why do you have to suck the fun out of everything?"

Actually, that could be a pretty good catch line for my site. "Advanced NFL Stats: Sucking all the fun out of football since 2007!"

And as a bonus, for every click on the win probability site Sunday, a portion of the proceeds will go to the Get a Steeler Fan His G.E.D. Fund.

Super Bowl Winner Stats

Last year I looked at how often teams won playoff games based on their season-long performance in various categories. I basically looked at how predictive various stats are in forecasting playoff wins. For example, how often does the team with the better passing efficiency win? I learned some interesting things such as how important run defense appears to become in the playoffs.

This time around I looked at just Super Bowls. How often does the team with the better season-long performance in each category win the big game? The sample size is very small, but the Super Bowl is a unique game in many ways, so we might learn something.

I only looked at Super Bowls since Super Bowl XV, the 1980 game between the Raiders and Eagles. In 1978, the passing rules changed and significantly altered the sport, but the league did not adjust for another couple years. Yes, this shrinks the sample to just 28 games, but I’m not going for statistical conclusions, just an initial look to see if anything stands out. No fancy regressions this time, just straightforward percentages.

The results are in the table below. You can read it as saying “the team with the better [efficiency stat] won the Super Bowl [x%] of the time.”

StatWin %
O Pass50
O Run57
O Int61
D Pass61
D Run46
D Int54

I’m a little surprised that the team with the better offensive passing efficiency only won 50% of the time. I’d think that would be a fairly solid advantage. Defensive passing looks like it might be the more important category.

Also, defensive run efficiency doesn’t appear to hold the same importance that it has in the playoffs lately. The better run-stopping team only won 46% of the time.

But again, we can’t really draw any conclusions. There are only 28 games in the sample, so a single game swings the percentage by about 3%.

In case you’re curious, the Steelers have the advantage in offensive running and all the defensive categories. The Cardinals have the better offensive passing efficiency and the lower interception rate.

How the Model Works--A Detailed Example Part 2

This is a continuation of an article that details exactly how my predictions and rankings are derived. You can read part 1 here. To recap, I'm using the Super Bowl match-up between the Steelers and Cardinals as an example. So far, we've used a logistic regression model based on team efficiency stats to estimate the probability each team will win.

We haven't accounted for strength of schedule yet. For example, the Steelers may have the NFL's best run defense, yielding only 3.3 yds per rush. But is that because they're good or because their opponents happened to have poor running games?

To adjust for opponent strength, we'll first need to calculate each team’s generic win probability (GWP), or the probability of winning a game against a notional league-average opponent at a neutral site. This would give us a good estimate of a team’s expected winning percentage based on their stats.

Since we already know each team’s logit components, all we need to know is the NFL-average logit. If we take the average efficiency stats and apply the model coefficients we get Logit (Avg) = -2.52.

Therefore, for the Cardinals, a game against a notional average opponent would look like:

Logit = Logit (ARI) – Logit (Avg)
= 0.07
The odds ratio is e0.07 = 1.09. Arizona’s GWP is 0.52—just barely above average. If we do the same thing for Pittsburgh, we get a GWP of 0.73. And it’s easy enough to do for all 32 teams. In fact, that’s what we need to do for our next step in the process, which is to adjust for average opponent strength.

The GWPs I calculated for Arizona and Pittsburgh were based on raw efficiency stats, unadjusted for opponent strength. That’s ok if we assume they had roughly the same strength of schedule. But often teams don’t, especially in the earlier weeks of the season.

To adjust for opponent strength, I could adjust each team efficiency stat according to the average opponents’ corresponding stat. In other words, I could adjust the Cardinals’ passing efficiency according to their opponents’ average defensive efficiency. I’d have to do that for all the stats in the model, which would be insanely complex. But I have a simpler method that produces the same results.

For each team, I average its to-date opponents’ GWP to measure strength of schedule. This season Arizona’s average opponent GWP was 0.51—essentially average. I can compute the average logit of Arizona’s opponents by reversing the process I’ve used so far.

The odds ratio for the Cardinals’ average opponent is 0.51/(1-0.51) = 1.03. The log of the odds ratio, or logit, is log(1.03) = 0.034. I can add that adjustment into the logit equation we used to get their original GWP.

Logit = Logit(ARI) – Logit(Avg) + 0.034
= 0.11

This makes the odds ratio e0.11 = 1.12. Their GWP now becomes 0.53. If you think about it intuitively, this makes sense. Their unadjusted GWP was 0.51. They (apparently) had a slightly tougher schedule than average. So their true, underlying team strength should be slightly higher than we originally estimated.

I said ‘apparently’ because now that we’ve adjusted each teams GWP, that makes each team’s average opponent GWP different. So we have to repeat the process of averaging each team’s opponent GWP and redoing the logistic adjustment. I iterate this (usually 4 or 5 times) until the adjusted GWPs converge. In other words, they stop changing because each successive adjustment gets smaller as it zeroes in on the true value.

Ultimately, Arizona’s opponent GWP is 0.50 and Pittsburgh’s is 0.53. After a full season of 16 games, strength of schedule tends to even out. But earlier in the season one team might have faced a schedule averaging 0.65 while another may have faced one averaging 0.35.

My hunch is that it’s this opponent adjustment technique that gives this model its accuracy. It’s easy enough to look at a team’s record or stats to intuitively assess how good it is, but it’s far more difficult to get a good grasp of how inflated or deflated its reputation may be due to the aggregate strength or weakness of its opponents.

Now that we’ve determined opponent adjustments, we can apply them to the game probability calculations. The full logit now becomes:

Logit = const + home field + (Team A logit + Team A Opp logit) –
(Team B logit + Team B Opp logit)

Pittsburgh’s opponent logit is log(0.53/(1-0.53)) = 0.10 and Arizona’s is log(0.50/1-.50) = 0.01. The game logit including opponent adjustments is now:

Logit = -0.36 + 0.72/2 + (-2.45 + 0.01) - (-1.51 + 0.10)
= -1.02

The odds ratio is therefore e-1.02, which makes the probability of Arizona winning 0.36. This estimate, based on opponent adjustments, is slightly lower than what we got for the unadjusted estimate. This makes sense because Arizona’s strength of schedule was basically average, and Pittsburgh’s was slightly tougher than average.

So there you have it, a complete estimate of Super Bowl XLIII probabilities and a step-by-step method of how I do it.

There are all kinds of variations to play around with. You can choose which weeks of stats to use, to overweight, or to ignore. You can calculate a team’s offensive GWP by holding its own defensive stats average in the calculations, and only adjusting for opponent defensive stats. The resulting OGWP tells us how a team would do on just the strength of its offense alone. It’s the generic win probability assuming the team had a league-average defense. DGWP is vice versa.

One variation I employ is to counter early-season overconfidence by adding a number of dummy weeks of league-average data to each team's stats. This regresses each team's stats to the league mean, which reduces the tendency for team stats to be extreme due to small sample size. For example, it takes about 6 weeks for a team's offensive run efficiency to stabilize near its ultimate season-long average. So at week 3, I'll add 3 games worth of purely average performance into each team's running efficiency stat. No team will sustain either 7.5 yds per rush or 2.2 yds per rush.

This entire process might seem ridiculously convoluted, but it’s actually pretty simple. You get the coefficients from the regression. You next calculate each team’s logit with simple arithmetic. Game probabilities and “GWP” are just a logarithm away. Opponent adjustments require a little more effort, but in the end, you just add them into the logit equation.

Voila--a completely objective, highly accurate NFL game prediction and team ranking system.

Weekly Roundup

I was blown away last week when within hours of posting the win probability calculator, reader Zach wrote up an analysis of when to go for a 2-point conversion. Very cool.

Jim Schwartz is the new head coach of the Lions. Besides being a fellow native of Baltimore, I like him because he's known to have a solid grasp of statistics. Like Bill Belichick, Schwartz has an economics degree. The New York Times has a good write up on him from last fall.

The new issue of the Journal of Quantitative Analysis in Sports is out. There's an article on ranking teams and predicting games, including in the NFL. I've only skimmed it. There are a couple of other articles that look interesting too. There is an article on determining the evenness of sports competitions in rugby, essentially doing the same thing--ranking and forecasting. There is also an article on using neural networks to predict NBA games. (I've experimented with neural network software. I can't say I completely understand it, but I was able to get close to the same prediction accuracy from my usual regression model.)

Sometimes the articles in JQAS are crackpot nonsense. So be warned--just because something has a fancy academic title, comes wrapped in a pretty .pdf, and is loaded with references, doesn't guarantee it has any value. These particular articles don't immediately jump out as kooky, thankfully.

Math and stats pay. Check out the top 3 jobs. Funny, I don't see Navy carrier pilot on the list. When I used to fly, I often wondered how much you'd have to pay someone to do that in an open and competitive market. Take away the "serving your country" aspect, and how much money would someone with those skills make? Throw in the danger and the fact that they have to live at sea for extended periods, and you might have to pay them like these guys.

The PFR blog has the usual installments of best-ever, worst-ever trivia. This time, it's best-ever Super Bowl losers (part 2). I'd like to see worst-ever Super Bowl winners too. [Edit: Here it is.] What kills me is that the two biggest championship upsets in American sports history feature an upstart second-fiddle team from New York beating an overwhelming favorite from Baltimore. The Mets upset the O's in '69, and the winter before, the Jets shocked Baltimore in Super Bowl III. I wasn't even born yet, and it still hurts. One thing forgotten about the Super Bowl back then is that it was more of an actual bowl game--a post-season exhibition. Baltimore had already won the NFL Championship. Back then, as I understand it, the Super Bowl was a cross between a meaningless Pro Bowl-type game and the modern championship as we now know it. Not totally meaningless, but not yet considered the championship either. The Jets certainly changed that.

PFR also has a new Super Bowl history page.

Smart Football teaches us about zone blitzes.

Dave Berri has his final rankings of the year, plus he looks at the Lions.

Over at the community site, Denis O'Regan compares scoring frequency in soccer and football using Poisson distributions. Also, Oberon Faelord (real name?) reminds us that not all 10-point leads are the same.

Since the Steelers beat my Ravens last Sunday to reach the Super Bowl, I'm allowed one outburst of sour grapes. When I was in the Navy, I noticed every part of the country seemed to have a sizable stable of Steeler fans. I remember going to watch a Steelers-Browns playoff game at a sports bar in Pensacola and couldn't believe how many fans of each team were there. And here in Northern Virginia, they're everywhere. Now I understand why. I think a lot of it just bandwagon types from the 70s, but the economic dispersion of the rust-belt is also obviously part of the reason.

How the Model Works--A Detailed Example Part 1

One of the most common requests I get is to write up a complete sample game probability calculation. In this article, I'll explain how the model works and do a full detailed example using the upcoming Super Bowl between the Steelers and Cardinals.

When I originally constructed this model, the goal wasn’t to predict game outcomes but to identify how important the various phases of the game were compared to the others. In order to do that, I had to choose stats that were independent of the others, or at least as independent as possible.

There were several options, such as points scored and allowed, total yards, or first downs. But if I’m trying to measure the true strength of a team’s offensive passing game, passing touchdowns may not tell us much. A team may have a great defense that gives them good field position on most drives, or it might have a spectacular running back that can carry the offense into the red zone frequently. So points or touchdowns won’t work.

The other obvious option is total yards. But losing teams can accumulate lots of total passing yards late in a game's “trash time.” Or a team can generate lots of pass yards simply because they pass more often. That really doesn’t tell us how good a team is at passing. Total rushing yards presents a similar problem. A team with a great passing game can build a huge lead through three quarters, and then run out the clock in the 4th quarter accumulating a lot of rushing yards.

First downs made or allowed tells us a lot about how good an offense or defense is, but it doesn’t tell us anything about the relative contributions of the running and passing game of a team.

So, the best choice is going to be efficiency stats. Net yards per pass attempt and yards per rush tells us about how good a team truly is in those facets of the game. They are also largely independent of one another—not completely, but about as independent as possible.

Turnovers are also obviously critical. But total turnovers can be misleading just like total yards. Teams that pass infrequently may have few interceptions, but it may only be because they simply have fewer opportunities. So I also use interceptions per attempt, and fumbles per play.

So the model starts with team efficiency stats. But I don’t use all of them. For example, I throw out defensive fumble rate because although it helps explain past wins or losses, it doesn’t predict future games. A team’s defensive fumble rate is wildly inconsistent throughout a season, which suggests it’s very random or mostly due to an opponent’s ability to protect the ball. Forced fumbles and defensive interceptions show the same tendency. In the end, the model is based on:

  • Offensive net passing yds per att
  • Offensive rushing yds per att
  • Offensive interceptions per att
  • Offensive fumbles per play
  • Defensive net passing yds per att
  • Defensive rushing yds per att
  • Team penalty yds per play
  • Home field advantage

The model is a regression model, specifically a multivariate non-linear (logistic) regression. I know that sounds very technical, but the general idea behind regression is pretty intuitive. If you plotted a graph of a group of students’ SAT scores vs. their GPA, you’d see a rough diagonal line.

We can draw a line that estimates the relationship between SAT scores and GPA, and that line can be mathematically described with a slope and intercept. Here, we could say GPA = 1.5 + 2 * (test score).

Regression is what puts that line where it is. It draws a line that minimizes the error between the estimated GPA and the actual GPA of each case.

We can do the same thing with net passing efficiency and season wins. We can estimate season wins as Wins = -6.5 + 2.4*(off pass eff). Take the Cardinals this year. Their 7.1 net passing yds per attempt produces an estimate of 10.7 wins. They actually won 9, so it’s not a perfect system. We need to add more information, and that’s what multivariate regression can do.

Multivariate regression works the same way but is based on more than one predictor variable. Using both offensive and defensive pass efficiency as predictors, we get:

Wins = 9.6 + 2.3*(off pass eff) – 2.6*(def pass eff)

For the Cardinals, whose defensive pass efficiency was 6.5 yds per att in 2008, we get an estimate of 9.4 wins.

Adding the rest of the efficiency stats to the regression, we can improve the estimates even further. Unfortunately, linear regression, like we just used, can sometimes give us bad results. A team with the best stats imaginable would still only win 16 games in a season, but a linear regression might tell us they should win 21. Additionally, linear regression can estimate things like the total season wins, but it can’t estimate the chances of one team beating another. That’s where non-linear regression comes in.

Non-linear regression, like the logistic regression I use, is best used for dichotomous outcomes such as win or lose. A logistic regression model can estimate the probabilities of one outcome or the other based on input variables. It does this by using a logarithmic transformation, which is a fancy way to say taking the log of everything before doing all the computations. After computing the model and its output just as you would with linear regression, you “undo” the logarithm by taking the natural exponent of the result. Technically, logistic regression produces the “log of the odds ratio.” The odds ratio is the familiar “3 to 1” odds used at the race track, which can be translated into a probability of 0.75 (to 0.25).

Logistic regression would be useful if, instead of predicting GPA, you wanted to predict a student’s probability of graduation. Graduation is a yes-or-no dichotomous outcome, and winning an NFL game is no different. We can use the efficiency stats, that we already know contribute to winning, to estimate the chances one team beats another.

As an example, let’s compute the probability each opponent will win the upcoming Super Bowl based on offensive rushing efficiency alone. Based on the regular season game outcomes from 2002-2007, the regression output tells us that the intercept is zero and the coefficient of rushing efficiency is 0.25. The model can be written:

Log(odds ratio) = 0 + 0.25*(ARI off run eff) – 0.25*(PIT off run eff)
= 0.25*(3.46) – 0.25*(3.67)
= -0.052

The odds ratio, would be e-0.052 = 0.95. In other words, based on offensive running alone, the odds Arizona wins would be 0.95 to 1. In probability terms, this is 0.49, giving Pittsburgh the slightest edge. Another way of saying this is, holding all other factors equal, Pittsburgh’s advantage in rushing efficiency gives them just a 51% chance of winning.

[Note: You can translate odds ratios into probabilities by using prob = odds/(1+odds).]

Now we can do the same thing, but with the full list of predictor variables. The independent “input” variables are the efficiency stats for each team, and the dependent variable is the dichotomous outcome of each game—either 1 for a win or 0 for a loss. My handy regression software tells us that the model coefficients come out as:

Home Field0.72
O Pass0.46
O Run0.25
O Int-19.4
O Fum-19.4
D Pass-0.62
D Run-0.25
Pen Rate-1.53

The “logit,” or the change in the log of the odds ratio, can be written as:

Logit = const + home field + Team A logit - Team B logit


Logit = -0.36 + 0.72 + 0.46*(team A off pass eff) + 0.25*(team A off run eff) +...
- 0.46*(team B off pass eff) – 0.25*(team B off pass eff) - …

We have the constant, the home field advantage adjustment, and the sum of the products of each team’s coefficients and stats. The equation will eventually tell us Team A’s odds of winning, so we add its component logit and we subtract Team B’s. If Team A is the home team, we add the home field adjustment (0.72 * 1). If not, we can leave it out (0.72 * 0).

Now let’s look at Arizona and Pittsburgh in terms of their probability of winning Super Bowl XLIII. I’ll compute both teams’ logit component, combine them in the overall logit equation, then convert it to probabilities. To keep things simple, I’m going to only use team stats from the regular season for this example.

Arizona’s logit component would be:

Logit(ARI) = 0.46*7.1 + 0.25*3.5 – 19.4*0.024 – 19.4*0.028 – 0.62*6.5 – 0.25*4.0 – 1.53*0.39
= -2.45

Pittsburgh’s logit component would be:

Logit(PIT) = 0.46*6.0 + 0.25*3.7 – 19.4*0.030 – 19.4*0.026 – 0.62*4.3 – 0.25*3.3 – 1.53*0.41
= -1.51

Because the Super Bowl is at a neutral site, I’ll only add half of the home field adjustment when I combine the full equation.

Logit = -0.36 + 0.72/2 - 2.45 + 1.51
= -0.93

Therefore the odds ratio is e-0.93 = 0.39. That makes the probability of Arizona beating Pittsburgh at a neutral site equal to 0.39/(1+0.39) = 0.28. Pittsburgh’s corresponding probability would be 0.72.

(Notice how the constant and the home field adjustment cancels out to zero for a neutral site.)

In part 2 of this article, I'll explain how I factor in opponent adjustments and how I calculate a team's generic win probability (GWP)--the probability a team would win against a league-average opponent at a neutral site.

Play of the Year

Both conference championships were remarkable games. And despite very different styles of play, both games followed the same plot line. One team seemingly dominated the entire game, only to see the underdog within striking distance of the upset with only a few minutes to play. Arizona fought off an unprecedented 2nd half comeback while Pittsburgh won in dramatic fashion late in the 4th quarter.

The Eagles trailed by 18 points late until scoring their first touchdown late in the 3rd quarter, making the score 24-13. Shortly before the score, they had a near-zero probability of winning. Just to tie, they would need 11 points while hoping to keep Arizona off the scoreboard for the rest of the game. Two more TDs later, and Philly miraculously had the lead 25-24. For a brief second, they actually broke above 0.50 WP before Arizona was able to score the last TD of the game, securing their first trip to the Super Bowl.

The AFC championship game's graph looks very similar. A see-saw battle in the first quarter gives way to a clear advantage for the home team. A comeback effort peaks midway through the 4th quarter but falls short.

Down by 2 points with about 7 minutes remaining, Baltimore forced a punt and gained possession on their own 40. But a boneheaded personal foul during the punt puts the ball back at the 14. Flacco's deep throw to Todd Heap gave the Ravens a 1st and 10 at their 32. With just 30 yards to go to get into Matt Stover's field goal range and a ticket to the Super Bowl, the Ravens had somehow battled back to a 0.45 WP. You might think that's high given the Steelers' phenomenal defense, and you might be right. But don't tell that to Tennessee fans.

Three plays later Steelers safety Troy Polamalu intercepted a Flacco pass and returned it 40 yards for a touchdown, making the score 23-14 and putting the game out of reach. In terms of pure leverage in getting to the Super Bowl, no other single play comes close. This was unquestionably the play of the year (so far).

Congratulations Steelers and Cardinals fans.

Win Probability Calculator

I've built a tool for calculating the Win Probability for a given state of a game, and it's now available on-line. I originally built it for myself to streamline analyses of kick/go-for-it type decisions. But I thought if I made the interface user friendly enough other people might find it useful or interesting too.

You just enter the score difference, time remaining, field position, and down and to go distance. The applet returns the win probability for the team with possession along with some other handy stats.

Here is an example of how I'd use it. Say a team is up by 3 points with 2 minutes remaining in the 4th quarter. They're pinned down at their own 2 yard line and are facing 4th and 9. Should they take the intentional safety and give the ball (on average) to the other team at their own 44? Or should they punt, saving the 2 points but handing the ball (on average) to the other team at your own 42, just outside of field goal range?

I'd look at it from the opponent's point of view since they'll have the ball. I'd enter -3 for score difference, 2:00 in the 4th for time remaining, opponent's 42 for field position, and 1st and 10. The resulting WP is 0.37 for the punt.

For the safety, I'd enter -1 for score difference, 2:00 remaining in the 4th, own 44 for field position, and 1st down and 10. The resulting WP is 0.38 for the safety.

Since we'd want the WP for the opponent to be as low as possible, the 0.37 for the punt is the better option, but just barely. The options aren't that far apart. So if the punter gets a bad snap or feels they're a good chance the punt would be blocked, he should just fall on the ball or run out of the endzone. Far better to take the safety in that situation than risk a blocked kick and an easy touchdown.

Another way to look at it uses the probabilities of scoring from each position. Let's make one assumption--With 2:00 remaining, there is plenty of time for the opponent to score from midfield, and if he does, there's not enough time left for you to answer.

If you chose to give up the intentional safety, your opponent has a 0.23 probability of scoring a TD, and a 0.16 of scoring a FG. Since the opponent only needs a FG, chances are he'd stop short of the TD once the FG is relatively assured, so the total chance of scoring is 0.39. Any score will beat you, so the 0.39 WP is very close to the 0.38 we got using the WP calculator directly.

The punt analysis tells a different story. From your 42 yard line, the opponent has a 0.32 probability of scoring a TD, which will beat you, and a 0.22 probability of scoring a FG, which will tie the game. The net WP for your opponent is therefore 0.32 + (0.5 * 0.22) = 0.43. This is a little off from the 0.37 WP we calculated directly. What causes the difference?

The direct WP calculations are based on actual game situations and results--that is, what did coaches really do, and what were the real outcomes? But the scoring probabilities are general and not specific to the particular game circumstances. The difference in the estimated WP suggests that coaches are too timid when behind by 3 at the end of the game. Once in FG position, they'll become very conservative and play for the tie rather than risk a turnover going for the TD and the win. It sounds first.

But a tie only gives you a 50/50 shot at winning, and turnovers--which would always cause a loss--occur far less than 50% of the time. A turnover of any type only occurs less that 12% of the time inside FG range (the 35). Coaches should press for the win as long as time and downs permit.

Ultimately, the safety might be the better option (0.39 vs 0.43) if coaches could actually be expected to play to win. But because coaches can usually be counted on to play for the tie, the punt is the slightly better option.

Weekly Roundup

Sabermetric Research points to this King Kaufman column in Kaufman takes sports writers to task for not appreciating advanced statistics. Sports writers and even some coaches are often dismissive of stats, and some even wear their ignorance as a badge of honor. Baseball is going through a quiet transformation based on advanced statistics. Stats are the cutting edge of the sport, and writers would do well to get on board. The way I see it, statistics is just a tool for learning from large sets of facts. Everyone relies on statistics one way or another. You can chose to do it well or do it poorly. Judging from the interest in this site and others, there is a sizable audience hungry for something other than the same old tired storylines we get from columnists and analysts.

Game theory could help improve the overtime problems in the NFL. This article talks about how to fairly divide something between two people. One of the best solutions is the I'll cut you choose method. So if two people splitting a piece of cake, one person would cut it in half, and the other picks which half he wants. The person cutting has an interest in making the division as fair as possible. Overtime could work the same way. The coin flip winner picks the yard line for kickoff, and the other team gets to chose whether to receive or kick. I don't think most traditionalists would like this idea, but neither team could complain about the outcome.

ZEUS chimes in on the Titans' decision to tie Ravens with a field goal instead of go for the first down on 4th and inches. Here was my take. But let me skip to the last chapter for everybody. It is almost always better to go for the first down, even up to 4th and 7 on a team's own side of the field in most cases. About the only times an NFL team should kick are on 4th and very long, or if time is expiring and the kick will win or tie the game.

ZEUS also thinks Coughlin was right to go for it against the Eagles. One thing about ZEUS, though--From what I can tell, the software is a simulation-based model. This means that it takes the current state of a game including score, time, field position, etc. and randomly simulates a game from that point forward. It does this 'millions' of times to estimate average win probabilities (that it calls game winning chance--GWC).

To me, this approach is fraught with problems. You'd have to model so many things so precisely to get a reliable result. The distributions of all possible play results for all the possible combinations of cirmcumstances simply could not be modeled with any reliability. You'd need to make a lot of assumptions, and the results are going to be very sensitive to those assumptions. It wouldn't be much different than playing out a game on Madden in auto-computer mode a million times. It just depends on the fidelity of the simulation. On the other hand, the advantage of this approach is that you can tweak the distributions to reflect specific team strengths and weaknesses.

Individual team abilities are mitigating considerations in kick or go-for-it decisions, but my take is that these factors are often overstated. Take the Ravens-Titans game. As I pointed out earlier this week, Baltimore had only scored on 2 of 10 possessions up until Fisher's decision to kick. That might indicate that the Titans defense could almost certainly count on stopping the Ravens offense. But the NFL average is 3 scores every 9 drives, not significantly different from 2 out of 10, and Baltimore went on to score to make it 3 out of 11. How much does under- or over-performance within a game predict performance later in a game? PFR took up that question and finds only about 25% of a team's under-performance in the first 3 quarters carries through to the 4th quarter.

Home field advantage (HFA) has been a focus of sports science for decades. We can quantify the strength of its effect pretty easily, but what are the causes? Is it travel fatigue, time zone change, weather, crowd noise, the shape of the field or cut of the grass, or referee bias? I think we now have pretty solid evidence that a large part of HFA comes from environmental familiarity.

The possible effect of general unfamiliarity was summed up well by a commenter: "From a psychological standpoint, performance could be subtlety infuenced due to players being in a somewhat unfamiliar environment due the small but cumulative effects of orienting to the new environment. This could be many things— the locker room, where the sun comes in over the stadium, the overall “feel.” All of these small distractions could influence performance–a performance that involves instantaneous decision making and physical reaction times. Research has shown that orienting to even environments that are somewhat unfamiliar influence memory, judgment, decison making, etc."

PFR took a look at HFA when opponents are familiar with each other. I thought that this might be what explains why HFA diminishes throughout a game. My suggestion was that because visitors are more familiar with environments when playing divisional opponents than when playing other opponents, we should see a reduced effect. And sure enough, that's exactly what we see. Division rivals not only have a reduced overall HFA, the quarter-by-quarter decline in HFA is shallower. To me, this is evidence that a good deal of HFA in the NFL is due to overall environmental familiarity. I think this is very interesting, and it comes mostly from loose collaboration from people who've never met. Twenty years ago, before the internet, research like this wouldn't be possible. We're not curing cancer, but it is interesting and useful. Further comments here.

Smart Football dissects the deep crossing route.

The Numbers Guy takes a look at rare NFL scores. The Chargers-Steelers 11-10 score is not the only unique score this year.

I really liked Jim Glass's comments about the distinction between the "best team" and "the campion."

Contributer jjbtnw looks at 3rd down and 6 situations. Should teams run more often?

Dean Jens takes a stab at modeling punting and field goal kicking.

Pacifist Viking has an excellent article about the classical correlation/causation fallacy. PV debunks a lot of analysis by Cold Hard Football Facts. But I do enjoy CHFF, but not because of the analysis there. The stats aren't always the soundest, but the analysis is much better than most other sites. The writing is excellent and I like the historical perspective they add. I wish I could write like that.

An article at talks about which stats matter to coaches.

Sometime shortly I'll have a new Win Probability tool available. You can enter a game state and calculate the WP. I originally made it as a tool for myself when analyzing things such as 4th down decisions, but thought other people might find it interesting too. So I spiffed up the interface and will have it up and running soon.

Super Bowl Probabilities and Potential Match-Ups

Here are the probabilities of winning Super Bowl XLIII going into the conference championships. Pittsburgh is the most favored, followed by Baltimore and Arizona.

I've also listed the probabilities of the potential Super Bowl match-ups below.

TeamSB Champ

And here are the potential match-ups:

0.39 ARI vs BAL0.61
0.32 ARI vs PIT 0.68
0.39 PHI vs BAL 0.61
0.32 PHI vs PIT 0.68

Conference Championship Predictions

This weekend AFC defensive powerhouses Baltimore and Pittsburgh match up, and NFC underdogs Philadelphia and Arizona square off.

The game probabilities are based on team performance for all games since week 9, with the exception of week 17 when some teams played at less than full strength. This includes playoff games to date.

0.40 PHI at ARI 0.60
0.33 BAL at PIT 0.67

Probabilities based on the complete regular season would be:

BAL at PIT, 0.33 to 0.67
PHI at ARI, 0.67 to 0.33

Why is Arizona favored over Philadelphia when weeks 1-8 are thrown out? The Eagles racked up gaudy stats early in the season despite not having the wins to show for it. Eventually, their luck started to even out and they squeaked into the playoffs. But throwing out their best statistical weeks really hurts them. Plus, Arizona's last two wins against quality opponents were convincing, improving both their stats and their average opponent strength significantly.

Drive Results

I had intended to post this a few weeks ago, but moved on to other things and forgot all about it. Continuing my data-dump series of posts of things that may only interest me, here are three graphs that illustrate the results of drives based on field position. Note that these are league-wide baselines, averaging all drives from the 2000 through 2007 seasons. Only drives that ended due to the expiration of time were excluded.

Each graph is based on first down field position. For example, for all 1st downs at a team’s own 35 yard line, offenses go on to score touchdowns 20% of the time. It doesn’t matter how the team got to the 35 yard line with a first down. They could have started the drive there or converted a first down from their own 20.

Field positions are defined as distance to the opponent’s end zone. A team’s own 20 is the “80 yard line.”

These graphs are part of my real-time win probability site. As the field position, down, and distance change during a game, my site continually updates the likelihood the offense will score either a touchdown or field goal.

The first graph depicts how often NFL drives result in scores.

The second graph depicts how often drives result in turnovers.

The third graph combines the two previous graphs and adds punts. It also groups the data into 5-yard increments so the lines are less noisy and easier to read.

Year of the Run Defense?

Before the playoffs started last year, I did a post on which facets of team strength are most decisive in the playoffs. I looked at passing offense and defense, running offense and defense, turnovers and penalties. For each game, I tabulated how often the team with the better season-long performance in each stat won the game. For example, in the playoffs, the team with the better season-long offensive interception rate won the game 58% of the time.

I also looked at playoff-caliber match-ups in the regular season. Playoff-caliber was defined as teams that would finish with 10 or more wins. I wanted to see if there was something special about "playoff football" beyond the fact that there are (usually) only good teams on the field.

The most intriguing result was the sudden importance of run defense. Teams with the better defensive run efficiency won regular season playoff-caliber games 48% of the time. But in the playoffs, teams with the better run defense won 67% of the time.

This year, the four remaining teams in the playoffs feature the #2, #4, and #5 best run defenses in the league. The team with the better defensive run efficiency has won every single playoff game but one so far this year. That's 7 for 8.

But I'm not sure this means anything. My analysis only used 5 years of data--55 playoff games. This year obviously supports the notion that run defense somehow takes on special importance in the playoffs. But 2007 showed the opposite. Only 2 of 9 games were won by the team with the superior run defense. Two more games were pushes.

Still, that's 64% overall for the 2002 through 2008 seasons. You'd expect a team stronger in any category to win more often, but 65% is the highest of all the core abilities including offensive passing, offensive running, defensive passing, and even turnovers. It's particularly remarkable because run defense seems relatively insignificant in the regular season.

I'm not sure what causes the effect. It could be just random variation and small sample size. But the effect might be real and it could be due to weather or even conservative gameplans. (We could call that the Schottenheimer Effect.)

"Pulling An Orlovsky"

With 7:39 left in the 4th quarter and a 3 point lead, the Baltimore Ravens faced a 3rd and 10 from their own 1 yard line. Rookie quarterback Joe Flacco dropped back to pass, then dropped back a little more. He very nearly stepped out of the back of the end zone for a safety, much like Lions QB Dan Orlovski did against the Vikings this year. In that game, the safety was the difference as the Vikings won 12-10. In the post-game press conference, Flacco even remarked that he almost "pulled a Dan Orlovsky." Flacco was only inches from doing the same thing. How costly would that have been?

At that point in the game, a safety would have reduced the Ravens' lead to a single point, making the score 10-9. Plus, it would have given the ball right back to the Titans, on average at the Tennessee 44 yard line. A Titans field goal now would have given them a late 4th quarter lead, instead of merely tying the game. According to my win probability model, the Titans would have a 0.56 probability of winning following a safety.

In reality, Baltimore was able to punt the ball, giving the Titans field position on the Ravens' 42 yard line. This gave the Titans a 0.39 win probability. The difference between the almost-safety and the actual punt is 0.17. That's a lot for a few inches and a single play. Although it wouldn't have been fatal, it would have swung the advantage to Tennessee.

Fumble of the Year

The play of the game had to be the forced fumble by Baltimore safety Jim Leonard on Titans tight end Alge Crumpler near the Ravens' goal line. Prior to the snap the Titans had a 0.55 WP, but after the fumble they had only a 0.25 WP--a swing of 0.30. In fact, it was a little higher because had Crumpler held onto the ball and been tackled, the Titans would have been sitting first and goal from about the 5.

Go for TD or Kick the FG?

Another interesting wrinkle in the game came with 4:39 remaining in the 4th quarter. Facing 4th and inches on the Ravens' 9, the Titans elected to kick the field goal instead of going for the first down. PFR beats me to the punch in analyzing this decision, but I'll add my contribution here.

My WP model is useful, but it's generic. It does not consider things like weather, or the particular flow of an individual game. For example, there are no adjustments for how well one team has been able to move the ball, or that one team has a particularly stout run defense. But those sorts of considerations need a baseline around which to operate, and my model can provide that.

With 4:39 remaining, a 3-point deficit, a successful conversion would have resulted in a a 1st and goal on the Ravens' 9, giving the Titans a 0.57 WP. A failed conversion attempt would have resulted in a 0.25 WP for the Titans. The conversion attempt would have been highly likely. "And 1" conversions are successful 70% of the time, so an "and inches" would be expected to be at least that successful. Because of the strength of the Ravens run defense, a conservative estimate might be 75%. The net WP of the decision to go for the 1st down is therefore:

(0.57 * 0.75) + (0.25 * 0.25) = 0.49

A field goal from the 10 is successful 92% of the time. Tying the score at that point gives the ball back to Baltimore at their own 27. This would result in 0.42 WP for the Titans. A miss leaves the ball and the lead with the Ravens, resulting in a 0.22 WP for Tennessee. The net WP for the decision to kick the FG is:

(0.42* 0.92) + (0.22 * 0.08) = 0.40

Going for the first down is the clearly better call. However, this is a league-average baseline. Other considerations can modify this result. But in my opinion, coaches and analysts tend to overestimate the importance of these considerations.

For example, up to that point of the game, Baltimore only scored on 2 of its 10 possessions. So you could say, had Tennessee failed to convert the first down, they could count on getting the ball back with the score still tied. Or this might indicate Tennessee would have a significant advantage in overtime.

But as a league average, offenses score on 1 out of every 3 drives. Was Baltimore's 2 out of 10 that much different than 3 out of 9? Not at all. In fact, they went on to score on their next possession (making it 3 out of 11) to win the game.

The Delay of Game That Wasn't

The biggest play in Baltimore's game-winning field goal drive was a Flacco 23-yard pass completion to TE Todd Heap. But the snap took place a second after the play clock hit zero, and the refs did not see it in time to call it. Did this make the difference in the game?

I think the right way to look at this is to look at the game situation prior to the pass. Had the delay of game been called, the pass to Heap would never have happened. Baltimore would have been facing a 3rd and 7 instead of a 3rd and 2. The chance of converting a first down drops from 60% to 40%. This would have dropped Baltimore from a 0.66 to a 0.62 WP. That's a pretty big jump, but hardly game-changing. The Ravens still would have had the upper hand.

Edit: The Safety/Goal Line 3-And-Out That Wasn't

I just saw this pop up over at Pro Football Talk. It appears that the Titans were actually stuffed for a safety on their drive that started at their own 1. Plus, they may have been erroneously given an extra down on the next drive. The extra down turned out to be the play that topped the highlight reel in which Ray Lewis knocked the helmet off of Ahmard Hall. It gave the Titans a first down and some breathing room on their drive that ultimately ended as a Ravens interception on the Baltimore 10.

Weekly Roundup

The two big topics in football stats this week were the BCS and the NFL overtime rules. I've already had my say on OT rules, so let's start with the BCS.

Baseball analyst Bill James made a minor splash with an article urging a boycott of the BCS system by quantitative analysts (Hat tip--PFR). His fourth point is very interesting and goes against conventional wisdom. The BCS is not the result of big conference and big school greed. James says that we would have a Division I playoff system now except for the fact that the large number of small and uncompetitive schools would vote to share the playoff revenue too broadly.

Maybe so, but there is an underlying problem with college football. It's an unstable system. In systems engineering terms, the best example of a stable system is a thermostat. If it gets too hot, the thermostat kicks in to make it cooler, and vice versa. The NFL is a stable system. A team with a top record is rewarded with lower draft picks and a tougher schedule. A team with too many top players will lose some in free agency. But in college, the effect is reversed. College football is like an anti-thermostat. Imagine a room in which the thermostat turns up the heat the hotter the room gets. College football works the same way.

A successful football program will get money, attention, television time. This will lead to better recruits and even more wins. In turn, there's even more money, big name coaches, and better recruits. And every top recruit on one team's roster is a recruit unavailable to competitors. That's why every year we see the same handful of schools competing for championships. Oooh, I can't wait to find out who the 2009 champion is going to be. Will it be USC, Florida, LSU, Texas, Oklahoma, or Ohio State? The suspense is killing me!

While James' point may be true, it's nearly impossible for most schools to become competitive. Maybe the only way to stabilize the system--to create some semblance of competitive balance--is to spread the wealth.

Also on the college front, the Numbers Guy points out that special teams success does not necessarily correlate with winning.

"ZEUS" thinks the Colts should have taken an intentional safety at the end of regulation in their losing effort at San Diego. ZUES is software built by a couple of PhD-types that aids sideline decision-making, such as when to kick or go the 1st down or when to decline a penalty. It's very similar to the win probability system here (except that they're trying to sell it to teams for over $100,000. Good luck with that. Psst Mr. coach, you can have mine for half the price.)

Reader Ed Anthony emailed me to suggest this Monday, but I dismissed the idea too quick. I was going to do up an analysis to prove that ZEUS was wrong. It seemed obvious to me. Taking the safety turns a SD field goal into a game-winning kick instead of a game-tying one. A safety would have made a FG slightly less probable because it would have given the Chargers worse field position, but not nearly enough to risk the loss instead of the tie. What I left out of the analysis is that it also makes a touchdown less probable. And even though a TD would have been fatal in either case, making the TD less probable makes taking the safety a slightly smarter move. It doesn't matter that the TD didn't occur, it would have been the better decision at the time. Good instincts, Ed!

Doug Drinen at PFR looks at whether specific types of match-ups disproportionately affect game outcomes. Suppose there are two equal teams overall, but there is one particular facet where one team is much stronger than the other. Is it decisive? It's a complex question.

A couple years ago, I looked at the same issue but in a different way. I added interaction variables to my game predition regression model. I used all the same efficiency variables I usually do, but added additional factors such as [offensive run efficiency * opponent defensive run efficiency]. I was testing whether any particular match-up of team qualities had a non-linear effect above and beyond just a linear additive effect.

To put it simply, teams just don't put their abilities up on a table and let them play out independently. Team strengths and weaknesses interact with those of their opponenets. I was testing those interactions to see if they were significant. Some were, but the effect was very slight. The model was no more accurate and was far more complex than my original, so I abandoned its use in 2006. Now that I have a lot more data, it might be worth a revisit.

JKL, also at PFR, looks at older QB performance in the latter part of the season. This is a response to what FO looked at last week. The PFR analysis is far more comprehensive and does find a late-season effect. Also check out JKL's clever idea on revamping overtime.

Sometimes, this weekly roundup post turns into a cross-link-fest with PFR, Sabermetric Research, and Smart Football (which features some great college analysis this week). So there's plenty of room for fresh blood. I haven't mentioned it in a while, but the Advanced NFL Stats Community site is up and going strong. There's a new post once every couple days, and the site gets a few hundred visits every day. All contributors are welcome, so if you have an idea you'd like to share, or even just an opinion on stats in the NFL, please join in. There is data available for anyone who wants to kick it around.

Dennis O'Regan has two posts, one on how Baltimore's defense travels (I'm guessing it travels just fine!) and another on starter vs. backup QBs. Derek Singer looks at what kind of teams win championships. Josh Fryman has some observations on how regular season records may not be predictive of playoff success. Bob Burns wonders how a prediction system that doesn't account for wins can actually predict winners. Doug and Patrick Walters share their technical-financial-based system for predicting team fortunes.

Lots of activity this week. Enjoy the best NFL weekend of the year, and don't forget to check out the Win Probability site during the games. There's a special offer--this weekend only--get $100,000 off your first 5 visits!

Bad Overtime Logic and Good Kickers

One thing about the web is that you can tell what topics fans are genuinely interested in. This week there were many hits on the OT coin flip from people searching Google on the subject, and my article on the topic from September was bounced around fan message boards all over the country.

What Is Fair?

In my original article I point to the fact that in about 1 in 3 OT games, the coin flip winner scores before their opponent has a chance to go on offense. In total the coin flip winner wins 60% of all overtimes. At FO this week, they wrote “…only 60%...” as if that’s just a small edge.

I couldn’t disagree more. You could think, "50% is the optimally fair rate, and 60 is only 10 'more percent' than 50. Hey, 10% isn’t very much, so the coin flip is close enough to being fair. What’s the big deal?" But this is a flawed way of looking at it. Percent and percentage points are not the same thing.

Would you say that 3:2 odds represents a significant advantage? I sure would. That means that one team’s chances of winning are half again larger than the other’s. Well, that’s exactly what a 60% win-rate is—a 60/40 or 3:2 advantage. That can’t be ignored.

If you still don’t feel that a 3:2 advantage is unacceptable, what would be the odds at which you would say something needs to be fixed? 2:1? That would be a 67/33 split…and we’re “only 7%” away from that.

"They Had a Chance to Make a Stop"

One argument I frequently hear in defense of the status quo is "the defense had a chance to make a stop." True, even though one out of three OT games ends without one team ever touching the ball, the losing defense did have an opportunity to force a punt or turnover. On average they have a 2 in 3 chance of stopping the coin flip-winner from scoring. The problem with this argument is that the coin flip-winning defense would have a 100% chance of making a stop. The opposing offense will never take the field.

Even if the defense does manage to stop the coin flip-winning team from scoring, the advantage persists and cascades throughout the OT period. However the flow of the game shakes out, the best the coin flip-loser can do is break even in terms of possessions, and the worst the coin flip-winner can do is break even. In other words, if both teams are stopped from scoring on their first drives, the problem starts all over again.

Field Goals

Another point I made in my article was that a big part of the reason for the advantage was the movement of the kickoff spot from the 35 back to the 30 yard line. This had the unintended consequence of increasing the advantage of the receiving team in OT. Touchbacks became far less common and starting field position improved for offenses. But this is only part of the story.

The reason the NFL moved the kick off line back was because kickers had improved so much over the years, both in distance and accuracy. In 1974, the league FG% was 60.6%. This year, it was 84.5%. And that even masks how much kickers have truly improved. In 1974, 36% of all FG attempts were from 40 yards or beyond. In 2008, the figure was 41%.

These days, teams aren’t looking to get inside the 25 for a field goal attempt, they’re just hoping to get inside the 40. Getting a quick score in overtime has become a far easier proposition.

Field goals have gradually warped NFL football. In 1974, there were 3.0 FG attempts each game compared to 3.9 in 2008, a 30% increase. Kickers have become so accurate and kick such long distances that the sport has changed before our eyes. Overtime might be where the effect is magnified and most apparent, but the entire game is different.

Is it time to narrow the field goal posts?

Super Bowl Probabilities

Carolina is still the slight favorite, mostly thanks to their easier task this weekend. On the AFC side, Pittsburgh has the edge. Arizona fans, don't hold your breath...but that's exactly what I would have told Giants fans last year.

% Probability

TeamConf ChampSB Champ

One thing that hits me when I look at this table is that the "best" team probably won't win the Super Bowl. You might not agree the best team is Carolina. That's fine. I'm not sure if they are or not. It might be Pittsburgh or the Giants or any of the other remaining teams. But whoever they are, they're probably not going to be the champs.

This isn't a new revelation to me at all, but it's particularly clear now perhaps because there isn't a definitive favorite this season. The task of winning 3 or 4 playoff/Super Bowl games, even for a dominant team is so improbable that no one team would realistically have a greater than 50% chance. A team would need an average 80% chance of winning each of three games against playoff caliber opponents to just have a 50/50 shot at taking home the Lombardi Trophy. (0.803 = 0.51).