This installment cuts to the chase. From a strategic perspective, we want to understand how momentum may or may not affect the game so that coaches can make better decisions. Often, momentum is cited as a consideration to forgo strategically optimal choices for fear of losing the emotional and psychological edge thought to comprise momentum.
Here's the thinking: If a team tries to convert on 4th down but fails or unsuccessfully tries for a two-point conversion, it gives up the momentum to the other team. The implication is that failing on 4th down means that winning is now less probable than the resulting situation indicates, beyond what the numbers say. Therefore, the WP and Expected Points (EP) models used to estimate the values of the options no longer apply. In a nutshell, the analytic models underestimate the cost of failing.
[By the same token, the reverse argument should be just as valid. Wouldn't succeeding in a momentum-swinging play mean the chances of winning are even higher than the numbers indicate? For now, I'll set the 'upside' argument aside and examine only the 'downside' claim.]
Methodology
As I did in Part 2, I plotted the observed (actual) winning percentage from all drives from 1999-2013 (week 8) against the expected WP based on the game state. The plots are broken out according to whether the offense gained possession following a punt or following a turnover on downs.
The WP model is agnostic to how possession is obtained, so comparing its estimates to the actual winning percentages tells us whether teams win more or less often than expected given a particular game state. As I wrote in the previous article:
Because the WP model is ignorant of past events and only the present state, true momentum-swinging events should cause the model to break--if momentum is real, that is. In other words, teams with positive momentum should win more often than the WP model predicts, and teams without momentum should win less often than the WP model predicts.
The blue dashed line represents perfect calibration of the WP model. For example, if the model is tuned correctly, teams that have a 0.80 WP will go on to win 80% of the time. The green line represents game outcomes for offenses that obtained possession following a punt. The red line represents game outcomes for offenses that obtained possession following a turnover on downs.
I realize this isn't the most intuitive way to present this information. For clarity's sake, if the line is above the diagonal, it means that teams gaining possession go on to win more often than expected. If the line is below the diagonal, it means that teams gaining possession win less often than expected. [In retrospect, I wish I had reversed the chart's presentation, making it from the perspective of the team making the decision to either punt or go for it, and not from the perspective of the team gaining possession. This would make 'up' represent winning more often for the team with the choice.]
Results
Let's look specifically at the red line representing turnovers on downs. It is primarily and consistently below the diagonal, indicating that offenses that gain possession following a failed 4th down conversion attempt usually win less often than otherwise expected. This result suggests that there is no true momentum loss following a turnover on downs. The opposite may be true.
And just to be clear: These results do not say that failing on 4th down is better than punting in any particular circumstance. It's saying: assuming that you've handed possession over to the other team in comparable sets of circumstances, teams that failed on 4th down win slightly more often than teams that punted.
Unfortunately, we can't stop there because the results could be biased. It's possible that teams that go for it on 4th down and win anyway just happen to be very good teams. Perhaps they have great defenses that can get the ball back quickly, or maybe their offense is potent enough to more than make up for turning the ball over on downs. Maybe that's why coaches of those teams feel that they can take bigger risks. If so, this tendency would mask the loss of momentum. For example, if NE (.710 win%) and IND (.660 win%) are the teams with all the failed 4th down attempts, they would naturally win more often just because they were good teams.
Bias?
To check for such a bias, I broke out each team's winning percentage in each year in the data set. Next, I weighted all the team win% by how often each team failed to convert a 4th down in each season. Lastly I recomputed the average weighted win%. I did this twice, once for all the failed 4th down attempts in which teams went on to win, and once for all the failed 4th down attempts in which teams went on to lose.
The average weighted win% for teams that failed on 4th down but went on to win was .577, which is considerable but not unexpected. After all, we're spotting ourselves at least one win and eliminating virtually all the teams with very few wins in a given season. This may indicate there is bias which would mask the presence of momentum loss after a turnover on downs. But the bias is stronger in the other direction. The average weighted win% for teams that failed on 4th down but went on to lose was .415, slightly further from .500 than the .577 teams that went on to win. Although there is some bias that may mask momentum loss, on net the bias points slightly in the other direction. If anything, we would be seeing momentum when it's not really there.
This makes some intuitive sense. Losing teams that fail to convert on 4th downs, well, they're likely to go on losing. So it's easy to see how the perception that turning the ball over on downs can create negative momentum gets reinforced.
Conclusion
The bottom line of this analysis is that turning the ball over on downs does not cause a team to lose more often than would otherwise be expected, given the same game state after the change of possession. Coaches should not be fearful of losing momentum.
I don't want to be overly critical, but it seems like the larger notion of momentum is a poorly defined hypothesis. These articles testing various notions of momentum seems like an intellectual game of whack-a-mole.
"... For example, if NE (.710 win%) and IND (.660 win%) are the teams with all the failed 4th down attempts, they would naturally win more often just because they were good teams. ..."
Controlling for team quality might make for more powerful EP and WP models in the general context. As with other NFL stuff, the data set is probably a bit sparse.
All these articles rely on "momentous" turnovers as a way of measuring momentum, but isn't momentum really saying "when things are going good, results tend to be good" and "when things are going band, results tend to be bad." I know "results" can be measure but I'm not sure how you would measure "when things are going good/bad"
@Anonymous: No, momentum says "when things are going good, things will continue to go better than they normally would; and conversely, when things are going bad, things will continue to go worse than they normally would". You measure "when things are going good/bad" using anything, say, the derivative of WP versus time (That is, if your chance of winning is going up, things are good.) or your SR% at any given time (That is, if you are succeeding at the plays you are running, things are good.)
NateTG - Whenever the punters (i.e. those who believe in punting on 4th and 1) are presented with the numbers, their argument is nearly always that going for it and failing loses 'momentum'. Seeing as we've already told them what the WP would be if they fail, we can only assume what they mean is that they think WP following a failed 4th down would be even lower. What this is testing is whether there is any evidence that teams that fail on 4th down lose more often than teams that find themselves in the same situation without turning the ball over (spoiler - there is not)
As a wonderful example of how momentum is overused, check out the BBC feed from the Murray v Djokovic Wimbledon final (http://www.bbc.co.uk/sport/0/tennis/23101783)
Two sets to love down, two-love down in the third set, Djokovic wins four games in a row to lead the set 4-2. The quote from Tim Henman on the feed is "Momentum is a funny thing, we can't see it but we can feel it, and the momentum is certainly with Djokovic in this set."
From that point, Murray wins four games on the bounce of his own, takes the set and the championship. Momentum eh? She's a fickle ally.
Excellent analysis as always. One thing I do find that consistently nags me, however, is that when making comparisons like this you don't provide statistical measures of uncertainty. It would be very helpful in interpreting this graph if you provided a p-value for the difference between the two trendlines. I'd be interested to know if there really is a statistically significant difference (in the opposite direction as that predicted by the "momentum" theory) between the win expectancy of teams that take over on downs and those that receive a punt, or if difference is not significant, in which case we would simply fail to reject the null hypothesis of there being no difference beyond what is already predicted by the current game state (I would suspect the latter but would be interested to see the numbers).
I find that this criticism seems to apply to many of the comparative analyses done at ANS. If you're using Excel to make these graphs, the relevant p-values or confidence intervals should be easy to calculate and include, and I think that doing so would shed a lot of light on the (otherwise excellent) analysis you do here over and above what we're already able to derive.
Excel is often used to keep a common look and feel, but the analysis is almost always custom coded.
The next 2 installments (and hopefully last 2) will use a completely different approach.
Teams turning the ball over on downs have to have made the decision to go for it on 4th down. From many of your other articles, this tends to be an area where coaches throw away WP - so perhaps there's a bias where the teams turning the ball over on downs are more likely to win because in the course of the remainder of the game they will throw away less WP than their opponent on subsequent 4th down decisions.
So Ive been thinking about it for a while...
The weighted team average calculated in this article shows that overall bad teams are attempting more of the 4th downs than good teams..basically, right?
I was wondering if there is a bias as to where on the "WP estimated" axis they fall (where on the x axis on the graph above). What if most of the data comes from the right side of the x axis (or even the FAR right side of the x axis)? This would open up the possibility that the left side of the x axis is filled with the good teams. So - overall - the sample is made up of more bad teams attempting 4th downs, but they are all concentrated on the right side of the x axis. The good teams can actually outnumber the bad teams for all the 4th down conversions attempted when their WP is above 50%. So, lets say that after a failed 4th down, the team receiving the ball has an actual WP of ~10% (red line on the graph above). This is lower than the WP model says you should have (~15%) - but its somewhat expected if most of the teams that attempted the conversion are good teams.
I really don't think that effect can be very big, since you can see on the far right hand of the x axis that there is essentially no difference between a 4th down stop and receiving a punt..i.e, no momentum effect. Over here, on the far right hand side we know there are mostly bad teams attempting these 4th downs so the bias is gone.