
- Home Posts filed under win probability
Chip's Challenging Decisions

Sneak Peek at WP 2.0
As a quick refresher the WP model tells us the chance that a team will win a game in progress as a function of the game state--score, time, down, distance...etc. Although it's certainly interesting to have a good idea of how likely your favorite team is to win, the model's usefulness goes far beyond that.
WP is the ultimate measure of utility in football. As Herm once reminded us all, You play to win the game! Hello!, and WP measures how close or far you are from that single-minded goal. Its elegance lies in its perfectly linear proportions. Having a 40% chance at winning is exactly twice as good as having a 20% chance at winning, and an 80% chance is twice as good as 40%. You get the idea.
That feature allows analysts to use the model as a decision support tool. Simply put, any decision can be assessed on the following basis: Do the thing that gives you the best chance of winning. That's hardly controversial. The tough part is figuring out what the relevant chances of winning are for the decision-maker's various options, and that's what the WP model does. Thankfully, once the model is created, only fifth grade arithmetic is required for some very practical applications of interest to team decision-makers and to fans alike.
Lacrosse Analytics
For those not familiar with lacrosse, imagine hockey played on a football field but, you know, with cleats instead of skates. And instead of a flat puck and flat sticks, there's a round ball and the sticks have small netted pocket to carry said ball. And instead of 3 periods, which must be some sort of weird French-Canadian socialist metric system thing, there's an even 4 quarters of play in lacrosse, just like God intended. But pretty much everything else is the same as hockey--face offs, goaltending, penalties & power plays. Lacrosse players tend to have more teeth though.
Because players carry the ball in their sticks rather than push it around on ice, possession tends to be more permanent than hockey. Lacrosse belongs to a class of sports I think of as "flow" sports. Soccer, hockey, lacrosse, field hockey, and to some degree basketball qualify. They are characterized by unbroken and continuous play, a ball loosely possessed by one team, and netted goals at either end of the field (or court). There are many variants of the basic team/ball/goal sport--for those of us old enough to remember the Goodwill Games of the 1980s, we have the dystopic sport of motoball burned into our brains. And for those of us (un)fortunate enough to attend the US Naval Academy (or the NY State penitentiary system) there's field ball. The interesting thing about these sports is that they can all be modeled the same way.
So with lacrosse season underway, I thought I'd take a detour from football work and make my contribution to lacrosse analytics. I built a parametric win probability model for lacrosse based on score, time, and possession. Here's how often a team can expect to win based on neutral possession--when there's a loose ball or immediately upon a faceoff following a previous score:
What I'm Working On
It's been almost 6 years since I introduced the win probability model. It's been useful, to say the least. But it's also been a prisoner of the decisions I made back in 2008, long before I realized just how much it could help analyze the game. Imagine a building that serves its purpose adequately, but came to be as the result of many unplanned additions and modifications. That's essentially the current WP model, an ungainly algorithm with layers upon layers of features added on top of the original fit. It works, but it's more complicated than it needs to be, which makes upkeep a big problem.
Despite last season's improvements, it's long past time for an overhaul. Adding the new overtime rules, team strength adjustments, and coin flip considerations were big steps forward, but ultimately they were just more additions to the house.
The problem is that I'm invested in an architecture that wasn't planned to be used as a decision analysis tool. It must have been in 2007 when I recall some tv announcer say that Brian Billick was 500-1 (or whatever) when the Ravens had a lead of 14 points or more. I immediately thought, isn't that due more to Chris McAllister than Brian Billick? And, by the way, what is the chance a team will win given a certain lead and time remaining? When can I relax when my home team is up by 10 points? 13 points? 17 points?
That was the only purpose behind the original model. It didn't need a lot of precision or features. But soon I realized that if it were improved sufficiently, it could be much more. So I added field position. And then I added better statistical smoothing. And then I added down and distance. Then I added more and more features, but they were always modifications and overlays to the underlying model, all the while being tied to decisions I made years ago when I just wanted to satisfy my curiosity.
So I'm creating an all new model. Here's what it will include:
Seahawks Stumble, Should Have Allowed TD
Win Probability Model/Calculator Upgrades - Team Strength Adjustment & More
I just implemented several new features and significant upgrades to the Win Probability Calculator tool as well as the model behind it.
1. The biggest new feature is the capability to adjust the WP estimates based on relative team strength. This is accomplished by entering either a pregame WP estimate from the efficiency model, another source, or the game's point spread. The model has had this ability for a long time, but I didn't want to implement it until I had a sound way of doing so.
The prior pregame estimate of WP is revised as the game goes on with the baseline in-game WP estimate. The two probabilities are reconciled using the logit method. The trick is to understand how the pregame difference in team strength decays over the course of the game. At a certain point, it doesn't matter how much a team was favored if it's trailing by two or more scores late in the game. Team strength differential decays proportionally to the log of time as the game progresses according to a particular curve.
Pregame WP or spreads should be with respect to the current offense. For example, if the game's spread was -3 but the visitor has the ball in the scenario you are investigating, enter 3 for the spread. In what is a cool enough feature all by itself, entering a spread will automatically convert into a pregame WP estimate.
For the record, the WPA stats for teams and players will continue to use the baseline unadjusted WP numbers. If we used the adjusted WP numbers, every team and player would have a zero WPA assuming our pregame estimates were accurate. Put simply, using the adjusted WPA stats would defeat their very purpose and only be a measure of how good our pregame forecasts were.
2. The next most significant update is the ability to account for receiving the kickoff in the 2nd half. This can have an effect of up to a 0.04 WP swing in the first half of close games. The input asks users for whether the team with possession kicked off to start the first half or not. This consideration doesn't apply following the 2nd half kickoff, so for 2nd half scenarios you can just leave the input at the default Don't Know.
Slate: Punt From the Opponent's 26?
...But there’s an extra wrinkle. Strangely, Dallas would have preferred to keep Detroit within 3 points rather than extend its lead to 6. When desperate teams like the Lions with no timeouts remaining get into the outer rim of field goal range, they send in the field goal unit for a long-range attempt. This is an irrational decision, one I discovered the very first time I began looking at win probability numbers. Rather than try to win the game, teams in this situation settle for a tie—or rather, an attempted tie. Even if the field goal attempt is good, it only buys a 50–50 shot at the win in overtime...
Patriots Primarily Punt on Fourth Down
Bill Belichick is known for being one of the greatest football minds in NFL history. He's also known for being one of the "riskiest" play-callers -- riskiest in quotes to emphasize that he actually plays to the odds rather than most of the conservative football minds. Down 28-13 in the AFC Championship game, avid Patriots fan Bill Simmons put it best: "Ravens playing to win, Pats playing not to lose."
Belichick faced eight fourth downs in the game against the Ravens, seven of which were legitimate questions for the best course of action: Go for it, punt or kick the field goal. Whereas we would normally expect Belichick to be aggressive, he seemed more reserved in his decision-making. There are a ton of factors that could explain his passive play-calling. For example, it was extremely windy making field goals more difficult and maybe Belichick did not have faith that Joe Flacco could sustain a 90-yard drive due to the Ravens boom-or-bust offense (the Ravens ended up with three scoring drives of 10+ plays including a 90-yarder and 87-yarder).
Let's look at each fourth down decision, starting with the very first. Remember in the beginning of the game, especially if the score is fairly close, we should look at expected points, but as score and time become bigger factors, we will switch to win probability. Also, note that these are league baselines. The fact that the Patriots offense is No. 1 in the league and far above league average would indicate a higher success rate on 4th-down conversion attempts.
When to Intentionally Allow a TD When Tied
The game was tied at 24. The Broncos began a drive with 3:27 left to play. After a big Elway pass and several Terrell Davis runs, Denver put Green Bay in the Field Goal Choke Hold. Eventually, Denver fought its way to a 1st and goal from the Green Bay 8. A holding call on Shannon Sharpe moved Denver back to 1st and goal from the 18. Another Davis run set up 2nd and goal from the 1 with just 1:47 to play. Rather than allow Denver to run down the clock any further, head coach Mike Holmgren elected to allow the TD on the next play to give his offense a better chance to respond with a TD of their own.
In the wake of my previous five-part analysis of intentionally allowing a TD, I learned what the Internet jargon tl;dr stands for. I promise to make this one shorter. Previously, I looked at situations in which an offense that's trailing by 1 or 2 points could run out the clock before kicking a field goal to win. In many cases, depending on the time, score, field position, and number of timeouts remaining, it makes sense for the defense to allow a TD rather than try to force a stop and a FG attempt.
This time I'll examine similar situations where the score is tied. The considerations are a little different than when the defense has a 1 or 2 point lead. A tie score means that the defense can't be relatively assured of a win in the event of a miss. And given a successful FG to break a tie, a FG in response only re-ties the game.
New Overtime 4th Down Decisions When Down 3 Points
Your opponent kicked a FG on the first possession of overtime, and now your team needs a TD to win or a FG to continue the game. Your offense has driven down to the opponent's 10-yard line, but the drive has stalled. It's 4th down and 3. Should you go for the risky conversion and ultimately a TD for the win, or should you attempt a FG knowing you'd be at a disadvantage giving the ball to the opponent in sudden death?
The new NFL OT rules are unique in a lot of ways, and by unique I mean convoluted and contrived. There are basically three possible game states:
1. The first drive in which no score leads to Sudden Death, a TD wins, or a FG spawns the second state...
2. A possible second possession in which the offense is down by 3 points. It must score a TD to win or a FG to continue into SD.
3. Lastly, traditional SD itself.
The three game states successively easier to model. The first possession must consider all the possibilities of the following two states. The second state must only consider itself and the possibility of SD. The second possession is also slightly easier to model because there is no punt option. An offense trailing by 3 points simply must score or lose.
When to Intentionally Allow a Touchdown, Part 5
Putting It All Together
Taking a step back, the goal is to compare two strategies for the defense. The first is to play conventionally and force a stop and a FG attempt, hoping it will either fail or that there is enough time to match it with a counter-score. The second is to intentionally allow a TD immediately and use the time remaining to respond with a counter-TD.
So far, we have estimates for the key inputs:
-When the team on defense would get the ball back
-The probability of failure on the FG attempt
-The probability of responding to a made FG with a score
-The probability of responding to an allowed TD with a TD
When to Intentionally Allow a Touchdown, Part 4
The Probability of Responding to a Successful FG
If the defense plays conventionally, and the opponent's FG is successful, a score will be needed to win. Either a FG or TD will do. We can assume that the drive will began at or very near the offense's own 20-yard line for a couple reasons. First, the average starting field position for all drives is the 22. And second, it's very likely that, with time at a premium, the offense would prefer a touchback so that no time expires on the kickoff.
For this estimate, I looked at all game situations in which an offense needed a score to survive and had a 1st down at or very near its own 20. Success is defined as any drive that results in a TD or FG.
The blue line is the raw success rate as a function of seconds remaining at the 1st down snap. The red line is the smoothed estimate of the probability of scoring based on a local regression. Because there were several bins of data with very few cases (which caused the large noisy swings in the raw averages), I used a regression method that weighted each case by how often it appeared. In other words, if there were 10 cases where a team gained possession with around 50 seconds to play and 20 with around 60 seconds to play, the regression weighted each bin of cases proportionately.
When to Intentionally Allow a Touchdown, Part 3
Field Goal Success Probability
The simplest way to win for a defense caught in the field goal choke hold is to pray for an unsuccessful FG attempt. This is a mostly straightforward calculation but has a couple wrinkles. The success rates here include all possible causes of a FG ultimately being unsuccessful. That includes blocks, botched snaps, and even penalties that force longer attempts which turn out unsuccessful.
Also, as mentioned previously, an estimate of the expected field position on 4th down is required. For now, I will say that the offense will gain 5 yards during its 1st, 2nd and 3rd down plays. This is essentially a plausible placeholder for now, and a more detailed analysis can be done to confirm or adjust this.
The graph below shows three things. The jagged blue line is the actual raw FG success rate by line of scrimmage. (Add 17 or 18 yards for the commonly used 'kick distance'.) The red line is the estimated true probability of success. This estimate was computed non-parametrically, using locally weighted regression. The green line is the same curve as the red line, but offset 5 yards closer to the uprights. (Inside the 10, the 5 yards are progressively curtailed.)
When to Intentionally Allow a Touchdown, Part 2
Time Remaining Following a Forced FG
The first task of the analysis was to create an algorithm to compute the time on the clock when the team on defense would get the ball back following a forced FG. This is a function of current time and time outs remaining for the defense. For example, suppose the offense has just converted a series so that the 1st down snap will happen at 1:20, and the defense has two timeouts. The offense will run three times, you'll call both timeouts, and following a FG, you'll probably get the ball back with 17 seconds remaining. The two-minute warning is factored in, which is more challenging than it might seem.
The time-you-get-the-ball-back algorithm assumes that the defense will use its timeouts at every immediate opportunity. The only exception will be when the play itself spans the 2-minute warning. For example, if there is 2:10 on the clock at the snap and the play duration is 6 seconds, the defense will call a timeout at 2:04 rather than allow the clock to wind down to the 2-minute warning.
However, there is a special case where the defense may want to allow the clock to run down to the two-minute warning rather than use all its timeouts. For example, if there is 2:10 remaining between 2nd and 3rd down and a team has 2 timeouts remaining, it may chose to allow the clock to wind down to 2:00. The third down snap would occur following the two minute warning, and the defense would call its 2nd timeout between 3rd and 4th down, at around 1:54. This would allow the defense to save one timeout for use on offense.
When to Intentionally Allow a Touchdown
The Field Goal Choke Hold situation looks like this: The defense has a lead of 1 or 2 points with less than 3 minutes to play. The opposing offense has just converted a first down inside FG (attempt) range. Through week 13 this season there have been 12 games that qualify, which makes this situation about as common as overtime. (There were 12 more games with similar circumstances except the game was tied, a situation nearly identical but that requires some slightly different math.)
Two strategies are compared. The first is playing for the stop and forcing the FG attempt. This may be dangerous due to the ability of the offense to burn clock. The second strategy is to allow an immediate TD. This strategy forfeits points to the opponent in exchange for enough time to respond with a game-winning TD drive.
There is no guarantee the offense will take the bait and score a TD. If the offense is cognizant of the strategy, they may take a knee close to the goal line. So strictly speaking, this analysis merely estimates which combinations of circumstances make an immediate TD preferable to forcing a FG attempt. Even if all offenses were prepared for this contingency and were inclined to take a knee, this analysis lays out when that would be the smart move.
Fourth Downs in the New Overtime: First Possession
1. The initial drive of the first possession (A TD wins, a turnover or punt triggers Sudden Death (SD), and a FG triggers State 2.)
2. The team down by 3 now has one possession to match the FG (triggering SD) or score a TD to win.
3. Sudden Death
The possibilities are illustrated in the event tree below, along with some back-of-the-napkin transition probabilities I made back when the new rules were first proposed. (State 1 is "1st Poss". State 2 is the branch under "2nd Poss" that follows a FG in the 1st Poss. Sudden death is self-explanatory and occurs after a no-score in the 1st Poss or after a FG is matched in the 2nd Poss.)
Win Probability, In Color!
Nothing new here. I just thought this looked cool. It's the win probability of an offense with a first down trailing by 3 points. Actually, it is a little new. I'm trying to get smart on multivariate non-parametric kernel smoothing algorithms using R, the open source statistics package. That's just a fancy way of filtering out the noise inherent in the raw data and making a smooth estimate of the true probabilities. This chart is a product of my experimentation.
This axis going from 5-95 represent the midpoint of 10-yard bins of field position. The axis labeled 0-50 represent the game time remaining. The z-axis (vertical) shows the expected win probability. It's a little over-smoothed, at least in terms of field position. It might be under-smoothed in terms of time. Still have a lot learn.
Hail Mary Probabilities
There's a distinction between the WP model’s empirical methodology and its automatic output without any intervention or input from a human. When I do a detailed analysis for any specific play, I have the luxury of time and logic to dig directly into the data. The “auto” model that spits out WP estimates without any human input is based on lots of assumptions and interpolation on top of extrapolation etc. There are literally billions of combinations of game states (yd lines, downs, to go distances, seconds remaining, score difference, time outs). It’s just a matter of how much time I can put into coding the calculator to handle special cases like “a team's very last desperation play.”
With all the attention on that final play in the GB-SEA game, I thought it would be useful to look at Hail Mary success rates.
Saints Shocked: 4th Down Decision
RGIII was the only rookie quarterback to win on Sunday, when his Redskins took down the Saints 40-32. Brees was pressured heavily all game and while it's extremely difficult to quantify the value of coaching, people are wondering what kind of effect the loss of Sean Payton will mean.
One immediate impact is strategic decision-making. Sean Payton is known for his risky decisions. And by that, I mean he is known for those decisions that are publicly perceived as "risky," even if they are the statistically correct decision. Last year, the Saints were the No. 8 team in terms of win probability forfeited on 4th down (meaning they were in the top quarter of the league in 4th-down decision-making). So, what did we learn on Sunday?
With just about 2:00 to play in the 3rd quarter, down 30-14 already, the Saints were faced with a 4th-and-goal from the 3-yard line. Below you can see the results from the 4th-Down Calculator:
Is Tebow's Clutch Play Sustainable?
Chris Bruce and Andrew Mooney of HSAC do a bang up job using WPA and EPA to analyze Tim Tebow's clutchitude.
So, broadly, Tebow has indeed exhibited “clutch” performance –– performing pretty ordinarily overall, but at a higher level when it matters most. Given our previous analysis of other quarterbacks who did this in their first year (including Tom Brady, Aaron Rodgers and Drew Brees), it’s unlikely that this is a sustainable course in the long run. However, it should be encouraging for Tebow fans that his performance has been improving in absolute terms.