Let's examine the leverage of each challenge using the Win Probability (WP) model.
The first challenge was on a spot on a NYJ 3rd down. Late in the 3rd qtr, the refs spotted the ball just short of a 1st down at the BUF 38, creating a 4th and 1. Ryan challenged the spot and lost. Ryan eventually went for it on 4th down (which was smart), but the Jets were stuffed. A successful challenge would have given NYJ a 0.87 WP. A failed challenge would leave NYJ with a 0.84 WP based on going for it. That's a leverage of 0.03 WP. Every bit of WP matters, so it's nothing to sneeze at. But remember that challenges come with a cost.
(I also looked at the punt option, which was worth 0.82 WP for the Jets, so at best the leverage was 0.05 WP.
The second challenge came immediately later after BUF gained possession. BUF completed a 23-yd pass on 1st down, setting up a 1st and 10 at the NYJ 40. Ryan unsuccessfully challenged the completion. The result of the play was a 0.27 WP for BUF, but a 2-10 from their own 37 would mean a 0.21 WP. That's a leverage of 0.06 WP for the second challenge.
What about the fumble that wasn't? A successful challenge would have given the Jets a 1st and 10 on the BUF 41, up by 8 with 13:00 to play. That's worth 0.89 WP. An unsuccessful challenge wasn't so bad either. The Bills were called for holding, forcing a 2nd and 20 from their own 10, which gives the Jets a 0.87. Surprisingly that's only a leverage of 0.02 WP.
So of the three challenge situations, real and potential, the one that seemed biggest was actually the smallest. The biggest was the big 23-yd pass play that put BUF in NYJ territory. In full disclosure, I set out to prove how stupid it was for Ryan to squander his challenges by pointing to how big an impact a turnover would have been in that situation. But it wasn't as it seemed. Here's why I was wrong:
1. There was still a lot of time left in the game. A turnover would not have been nearly as fatal was we might think.
2. BUF's penalty on the play meant that the alternative to a fumble recovery was still really crappy for BUF. A 2nd and 20 from a team's own 10, down by 8 in the 4th quarter, is not a good place to be.
3. Running out of challenges only looks like a really big mistake in retrospect. BUF went on to score a TD and a 2-pt conversion on that drive. Outcome bias can affect us all, even stat wonks.
4. The Jets were already up by 8, meaning it would take BUF scoring a TD more than the Jets, plus a 2-pt conversion...just to tie. Realistically, the worst case scenario is a 0.50 WP for the Jets in regulation.
5. When the WP estimate gets close to 1 or 0 for a team, there isn't much further you can move the needle. You can never get more than a 1.00 WP, so events that would swing the WP wildly when the game is closer to 0.50 WP naturally can't have as big an impact when it's not so close. It's just the way probability works (or is it?)
I still think that Ryan's second challenge was unwise. Despite the fact it had the biggest leverage, it was a) his final challenge and b) it was his second timeout in a one-score game. And because he lost his first challenge, winning the second would not have granted him a third.
In this case, the numbers don't condemn Ryan's decisions. But I think it's a good demonstration of the potential for analyzing challenge decisions using the WP model. Ultimately, it's a very tough problem because we have to estimate the 'potential' value of saving the challenge, which varies greatly based on game situation.
I was under the impression that all turnovers are automatically reviewed since last season.
yes but denied turnovers aren't. Which needs to be amended soon, as it leads to a tendency to rule Turnover /TD on the field and let the replay decide later.
As a Jets fan ... this is nothing new with Rex.
He's a GREAT defensive coordinator, but bad at everything else.
The team constantly has too many men on the field, or dumb penalties from Kyle Wilson, or bad clock management - all of which is on the coaching.
You are looking at raw probability of winning vs. % increase/decrease in chances to win. Although in this case I think the point is moot. (because BUF had about an equal chance to win in all three instances)
Still, what's the bigger play? One that increases a team's raw chance to win by 15% when they have a starting WP of, say, 40%, or one that doubles the team's chances when they have a 12% chance to win?
Put another way, like you said, when a team has things mostly locked up, it's tough to move the needle much farther in their favor. Therefore, raw gains in WP are more massive. I would posit that cutting an opponent's chances to win in half is pretty important in that situation even when the arithmetic increase in WP is low.
This probably just goes without saying, but the most important criteria when making the decision to challenge has to be whether the ruling on the field will be overturned, right?
Yeah, yeah, hindsight bias and etc. But the difference between fourth down plays and challenges is that fourth down plays are typically about coin flip, while challenges predictably have anywhere between a tiny chance of success to a huge chance of success, from the perspective of the coach beforehand.
Good analysis, Brian. I am also surprised by the results.
That said, I thought both the challenges were pretty stupid. Neither play looked like an egregious error (as we now can confirm), and challenging spots is generally a low-risk move. More importantly, if you watched the first play, I have no idea why Rex thought he was going to win that challenge. I won't blame him -- obviously he outsources this task -- but man, it seemed like a no-win challenge from the word go.
And that cost them the opportunity to challenge the Manuel fumble. Completely agree that we (including me) fall victim to the bias that because the Bills scored a TD, the mistake by Rex was particularly egregious.
And, as I feel compelled to point out, my brother would say "oh by the way, we should not exactly gloss over the fact that the officials completely botched the call and Manuel wasn't down."
> This probably just goes without saying, but the most important criteria
> when making the decision to challenge has to be whether the ruling on
> the field will be overturned, right?
Not really. A coach only gets at most three challenges per game, so it's important to consider how much difference the challenge will make immediately, and how much impact it could have later.
There are also strange things that could come into play. For example, coach could challenge a call late in the game - even if he expected to lose the challenge - hoping for a 60 second review instead of getting 30 seconds for a time out.
this is very interesting, but my question is this: is it realistic to think that WP calculations can be whipped up in time for the coach to factor that probability into his decision to challenge or not? i honestly do not know the answer.
the author admitted that he set out to prove that Ryan used his challenges poorly, only to find out that his null theory was untrue. this indicates to me that WP analysis isn't necessarily intuitive without actually running the calculation; ie, even if the coach had a good basic understanding of WP analysis he might not necessarily intuitively know the leverage of the outcome he was considering challenging.
anyway, i found this to be a good read...does anyone have an idea of how WP could be realistically factored into a coach's decision whether to use a challenge? such decisions have to be made very quickly.
But doesn't outcome bias run in the other direction as well? There could have been, but wasn't, a need to use the challenge to effect a significant (though limited as you note above) swing and RR wouldn't have been able to do anything about it.
The jets have actually been among the league leaders in successful challenges and fewest penalties since Rex has been there so your point is incorrect
Mike Smith, there's a calculator on this website that spits out WP. It takes only as long as it takes you to type in the criteria, which should be plenty of time as long as the other team isn't hurrying to run a play (which presumably only happens for the biggest WP plays anyway).
This is a lot harder to mathematically prove than the 4th down thing, but my suspicion has always been that coaches are far too conservative with the use of timeouts/challenges.
Remember the ultimate goal is to use TO/chal at the point where they give you the most chance to win. That means that you must try and calculate the probability that you are going to need one at a later point in the game. There are two different types of errors a coach can make.
1. Using a timeout/challenge early on a low leverage play and not having it available later when needing it for a higher leverage play.
2. Not using a timeout/challenge early on a high leverage play and not using it at all (or using it for something silly like freezing the kicker) late.
Situation #1 occasionally happens. When it does, announcers describe the error in extreme detail and everybody focuses on it the next day. Situation #2 happens far more frequently and rarely gets any attention at all. That doesn't mean it isn't important. Its still a major error and miscalculation that costs teams games even if nobody notices.
Remember the goal is to win the game not to save the timeouts for the last minute. If you see a situation where a timeout or challenge would significantly improve your chance of winning. You should use it. You might not get another chance.
I am interested to see what the standard deviation of your WP metric is, especially since we're considering numbers as small as .06. I am just interested in what kind of confidence interval this number could fall in to.