- Home Posts filed under basic
Sneak Peek at WP 2.0
As a quick refresher the WP model tells us the chance that a team will win a game in progress as a function of the game state--score, time, down, distance...etc. Although it's certainly interesting to have a good idea of how likely your favorite team is to win, the model's usefulness goes far beyond that.
WP is the ultimate measure of utility in football. As Herm once reminded us all, You play to win the game! Hello!, and WP measures how close or far you are from that single-minded goal. Its elegance lies in its perfectly linear proportions. Having a 40% chance at winning is exactly twice as good as having a 20% chance at winning, and an 80% chance is twice as good as 40%. You get the idea.
That feature allows analysts to use the model as a decision support tool. Simply put, any decision can be assessed on the following basis: Do the thing that gives you the best chance of winning. That's hardly controversial. The tough part is figuring out what the relevant chances of winning are for the decision-maker's various options, and that's what the WP model does. Thankfully, once the model is created, only fifth grade arithmetic is required for some very practical applications of interest to team decision-makers and to fans alike.
Implications of a 33-Yard XP
Over the past five seasons, attempts from that distance are successful 91.5% of the time. That should put a bit of excitement and drama into XPs, especially late in close games, which is what the NFL wants. But it might also have another effect on the game.
Currently, two-point conversions are successful at just about half that rate, somewhere north of 45%. The actual rate is somewhat nebulous, because of how fakes and aborted kick attempts into two-point attempts are counted.
It's likely the NFL chose the 15-yd line for a reason. The success rates for kicks from that distance are approximately twice the success rate for a 2-point attempt, making the entire extra point process "risk-neutral." In other words, going for two gives teams have half the chance at twice the points.
Momentum Part 5 - Series Level Analysis
Part 1 examined the possibility that momentum exists by measuring whether teams that obtain the ball in momentum-swinging ways go on to score more frequently than teams that obtained the ball by regular means.
Part 2 looked at whether teams that gained possession following momentous plays went on to win more often than we would otherwise expect.
Part 3 focused on drive success following a turnover on downs, which is often cited by coaches and analysts as a reason not to go by the numbers when making strategic decisions.
Part 4 applied a different method of examining momentum by using the runs test so see the degree to which team performance is streakier than random, independent trials.
In this part, I'll apply the runs test at the series level, to see if teams convert first downs (or fail to convert them) more consecutively than random independence would suggest. But first, I'll tie up some loose ends left hanging from part 4. Specifically, I'll redo the play-level runs test to eliminate potential confusion caused by a team with disparate performance from their offensive and defensive squads.
The Value of a Timeout - Part 2
In this part, I'll apply a more rigorous analysis and get a better approximation. We'll also be able to repeat the methodology and build a generalized model of timeout values for any combination of score, time, and field position.
Methodology
For my purposes here, I used a logit regression. (Do not try to build a general WP model using logit regression. It won't work. The sport is too complex to capture the interactions properly.) Logit regression is suitable in this exercise because we're only going to look at regions of the game with fairly linear WP curves. I'm also only interested in the coefficient of the timeout variables, the relative values of timeout states, and not the full prediction of the model.
I specified the model with winning {0,1} as the outcome variable, and with yard line, score difference, time remaining, and timeouts for the offense and defense as predictors. The sample was restricted to 1st downs in the 3rd quarter near midfield, with the offense ahead by 0 to 7 points.
Results
The Value of a Timeout - A First Approximation
As I noted in my game commentary, if you need to call a timeout to think over your options, the situation is probably not far from the point of indifference where the options are nearly equal in value. And timeouts have significant value, particularly in situations like this example--late in the game and trailing by less than a TD--because you'll very likely need to stop the clock in the end-game, either to get the ball back or during a final offensive drive. Would Carroll have been better off making a quick but sub-optimum choice, rather than make the optimum choice but by burning a timeout along the way?
Here's another common situation. A team trails by one score in the third quarter. It's 3rd and 1 near midfield and the play clock is near zero. Instead of taking the delay of game penalty and facing a 3rd and 6, the head coach or QB calls a timeout. Was that the best choice, or would the team be better off facing 3rd and 6 but keeping all of its timeouts?
Both questions hinge on the value of a timeout, which has been something of a white whale of mine for a while. Knowing the value of a timeout would help coaches make better game management decisions, including clock management and replay challenges.
In this article, I'll estimate the value of a timeout by looking at how often teams win based on how many timeouts they have remaining. It's an exceptionally complex problem, so I'll simplify things by looking at a cross section of game situations--3rd quarter, one-score lead, first down at near midfield. First, I'll walk through a relatively crude but common-sense analysis, then I'll report the results of a more sophisticated method and see how both approaches compare.
Momentum 4: How Streaky Are NFL Games?
This is the 4th part in my series on examining the concept of momentum in NFL games.The first part looked at whether teams that gained possession of the ball by momentum-swinging means went on to score more frequently than teams that gained possession by regular means. The second part of this series looked at whether teams that gained possession following momentous plays went on to win more often than we would otherwise expect. The third part focused on drive success following a turnover on downs, which is often cited by coaches and analysts as a reason not to go by the numbers when making strategic decisions.
This article will examine how 'streaky' NFL games tend to be. If momentum is real and it affects game outcomes, it would result in streaks of success and failure that are longer than we would expect by chance. But if consecutive plays are independent of previous success, the streaks of success and failure will tend to be no longer than expected by chance. This method of analysis does not rely on any particular definition of a precipitating momentum-swing, as it looks at entire games to measure whether success begets further success and whether failure leads to more failure.
For momentum to have a tangible effect on games, it does not require completely unbroken strings of successful or unsuccessful plays. But if success does enhance the chance of subsequent success, then the streaks of outcomes will be longer than if by chance alone.
For this analysis, I applied the Runs Test to the sequence of plays in a game. This produces a statistic indicating how streaky a string of results is compared to what would be expected by chance. For example, consider the following 3 strings of results of flipping a coin 8 times:
HTHTHTHT, HHHHTTTT, HTTTHTHH
The Runs Test works like this:
What Kind of Teams Are Super Bowl Winners?
Here's the plot of every team's regular season Expected Points Added (EPA) for every team from 1999-2013. The horizontal axis represents their offensive EPA per game, and the horizontal axis represents their defensive EPA per game. The best teams are in the upper-right quadrant, while the worst are in the lower-left. (Click to enlarge...it's suitable for framing!)
Momentum Part 3: After Failed 4th Down Conversion Attempts
This installment cuts to the chase. From a strategic perspective, we want to understand how momentum may or may not affect the game so that coaches can make better decisions. Often, momentum is cited as a consideration to forgo strategically optimal choices for fear of losing the emotional and psychological edge thought to comprise momentum.
Here's the thinking: If a team tries to convert on 4th down but fails or unsuccessfully tries for a two-point conversion, it gives up the momentum to the other team. The implication is that failing on 4th down means that winning is now less probable than the resulting situation indicates, beyond what the numbers say. Therefore, the WP and Expected Points (EP) models used to estimate the values of the options no longer apply. In a nutshell, the analytic models underestimate the cost of failing.
[By the same token, the reverse argument should be just as valid. Wouldn't succeeding in a momentum-swinging play mean the chances of winning are even higher than the numbers indicate? For now, I'll set the 'upside' argument aside and examine only the 'downside' claim.]
Momentum Part 2: The Effect of Momentum-Swinging Events on Game Outcomes
Like the previous analysis, I relied on how possession was obtained as an indication of a momentum-swing. For all drives from 1999-2013 ( through week 8), I compared a team's expected chances of winning (based on time, score, field position, down and distance) with how often that team actually won. I divided the data among three categories: possession obtained following a momentous play, possession obtained following a turnover on downs, and possession obtained following a non-momentous play.
Momentous obtainment includes fumble recoveries, interceptions, muffed punts, blocked kicks, and blocked field goals. I excluded missed field goals from the analysis because it was unclear to me how momentous they are. They are often thought of as big momentum changing events in close games but are too common (almost 20% of all kicks) to truly be momentous.
Momentum 1: Scoring Rates following 'Momentum-Swinging' Events
In this article, I'll explain why I think we see momentum when it's not really there. And to test the existence of momentum within NFL games, I'll compare the results of drives following 'momentum-swinging' events with those following non-momentum-swinging events.
For momentum to be a real thing in sports, it needs to have some connection to reality beyond the metaphysical and metaphorical. The theory is that good outcomes are emotionally uplifting, which in turn leads to better performance, which then feeds upon itself. It's understandable to believe in game momentum when we see games like this each week:
When Should the Defense Decline a Penalty After a Loss? Part 1
Before you read on, what do you think the break-even yardage is? What do you think most coaches think it is?
Rest vs. Rust After Thursday Night Football
Andrew Mooney is the Co-President of the Harvard Sports Analysis Collective. He is a senior majoring in Social Studies, which is another way of saying he's an economics major. Andrew has worked as an analytics intern in the NFL for about two years, and previously wrote for the Stats Driven blog at Boston.com. He's a big fan of all Detroit sports, and he'll throw an octopus on your ice if you're not watching.
Though I’m sure it provides players some much needed rest, it is not immediately clear what effect this time off has on performance. The qualitative cases for each side are pretty straightforward, and your grandfather used each of them liberally in instructing you in the wonders of sporting conventional wisdom. “Ah, they had an extra week to prepare AND get healthy,” he said knowingly after Washington’s 31-6 thrashing of the Eagles last season. “They just got rusty,” he told you after the Vikings fell to Chicago, 28-10, the following week. “What in the Sam Hill…” he muttered after the 49ers and Rams battled to a 24-24 tie.
"Thursdays are 6.3 percent less exciting."
Friend of ANS Aaron Gordon used the Excitement Index (EI) and Combeback Factor (CBF) to find out if the Thursday night games really are more boring than most games. From Aaron's article at Sports on Earth:
What NFL Network games have sorely missed are big comebacks. NFL Network games average a Comeback Factor 3.32 -- half the league average -- and only five games with a CBF of 5 or above (where the winning team had a win probability below 20 percent). By definition of the win probability model, 20 percent of the games played should feature a comeback with a CBF of 5 or above (which the larger data set confirms). For NFL Network games, its only 13 percent. For comparison, Monday night games --which often feature hand-picked matchups -- have an average CBF of 8.15, but are right about where they should be in terms of CBF games of 5 or above: 23 percent.
The Pay-Performance Linear Model
By regressing salary on performance (adjusted salary cap hit on the vertical (y) axis and Expected Points Added per Game (EPA/G) on the horizontal (x) axis), Rodgers' deal is insanely expensive by conventional standards. But by regressing performance on salary, his new contract is a bargain.
Which one is correct? That depends on several considerations. First, there are generally two types of analyses. The one I do most often is normative analysis--what should a team do? The second type is descriptive analysis--what do teams actually do? The right analytic tool can depend on which question we are trying to answer.
The reason that we saw two different results by swapping the axes is that Ordinary Least Squares (OLS) regression chooses a best-fit line by minimizing the square of the errors between the estimate and the actual data of the y variable. OLS therefore produces an estimate that naturally has a shallow slope with respect to the x axis. When we swap axes, the OLS algorithm is not symmetrical because of that shallowness.
Feature Enhancement: Time Calculator
The previous version of the Time Calculator could only base its estimate beginning with the time of the first down snap of a series. For the vast majority of situations that's ok, because offenses will typically only run plays that avoid stopping the clock--runs that stay in bounds. But sometimes there is a stoppage, due to either an incomplete pass,a runner going out of bounds, penalty, or other reason.
The old calculator could account for an unexpected stoppage if you add a notional timeout to the game state. For example, say the defense began the series with 1 timeout, then used it following 1st down, and there was an unexpected stoppage after second down. This scenario would be no different than if the defense began the series with 2 timeouts rather than their actual 1.
Still, it would be easier and more straightforward to make the calculator work for any down. Now you can enter the time at the snap of any down in a series along with the number of timeouts remaining, and the calculator will estimate the time after the change of possession.
All the other options remain the same: the average duration of each play, the game-clock duration between plays, and whether the defense would prefer to trade away some time on the clock to preserve a timeout for use on offense.
Try it out. The Advanced NFL Stats Time Calculator.
The Extra Point Must Go
This week's article at the Post asks What's the point of the extra point?
The extra point is something left over from gridiron football’s evolution from rugby. Originally, the ‘touchdown’ in rugby was less important than the ensuing free kick, and the points given for the touchdown and the ‘point after try’ varied during football’s early history. Today’s extra point is a vestige of football’s rugby roots. It’s football’s appendix–inconsequential, its original purpose uncertain...and safe to remove.
The Field Goal Likelihood Nexus
In this case I was looking at the probability of ending a drive with a made field goal in 2nd and 3rd down situations. (Second and third down modeling is especially challenging because there are fewer cases of each successive down. Plus there is an entire other dimension to consider--to-go distance. By comparison, first downs are almost always 10 yards to go.) After seeing the plots I thought there was clearly something wrong.
You'd expect that having fewer yards to go would lead to scoring more often, but once you think about it that's not always true when looking at only field goals. For most of the field, having fewer yards to go is better, but once a team passes a certain point, having more yards to go means it's more likely that a drive will stall inside field goal range.
Changes in the EP Curve over Time
As a refresher, EP is a concept of football utility. It measures the net point potential at any state of a drive, based on down, distance, and yard line. For example, a 1st and 10 at midfield represents 2 EP to the offense, meaning from that point forward it can expect, on average, a 2-point net advantage over its opponent. More details on the concept can be found here.
With offense gaining an ever firmer upper hand, the EP curve must be affected. But it can’t just be sliding up across all states. At its end-points, the curve must be bounded at slightly under 7 points at the opponent’s goal line to slightly less than -3 points inside a team’s own goal line. We would therefore expect the curve to bow slightly upward over time.
The graph below plots raw, unsmoothed EP values for 1st and 10 (or goal) states in normal football situations, when time is not yet a factor and the score is reasonably close. The blue line represents the first three seasons in my data set, 2000-03, and the red line represents the most recent three seasons, 2008-11.
How Much Time Does It Take to Get into FG Range?
You wouldn't know by reading this study. It sets up a multiple regression model that uses a kitchen-sink approach to estimating the time needed in the end-game to get to the 35-yd line, commonly accepted as FG range. It uses QB rating, time remaining at the start of the drive, number of all-pro players on the offense, time outs remaining, starting field position, home field advantage, and whether the 2-minute warning is still available. The dependent variable is the time taken to reach the 35.
There are numerous fatal problems with this study. First, the model assumes linearity of the effects of predictor variables. I can tell you from my intimate familiarity with the variables involved that they are not linear at all. The model also assumes a normally distributed outcome variable, which is not investigated, and I doubt could be possible because games are bounded by the expiration of regulation time.
The study uses 3 seasons of data, which only yields 92 example situations to analyze.
The authors find enormous multi-collinearity problems with their model, and I'm not surprised. The model specification looks like this:
time taken = constant + field position + time outs+ ...a bunch of other stuff... + game time at drive start + game time when reached 35
But doesn't time taken = time at start - time when 35 reached? Of course. You can't have a regression model where the dependent variable is always the exact sum of two of the independent variables. The model's r-squared is 0.97, because it's one giant tautology.
On Opponent Strength and Team Strength Correlation
This post at Football Outsiders caught my eye today. The IgglesBlog noticed something odd with their team rankings. I’ve notice the same phenomenon in my own systems—that team ranking methods that adjust for opponent strength tend to produce rankings that correlate (inversely) with a team’s strength of schedule. In other words, top ranked teams appear to have weaker schedules and low ranked teams appear to have stronger schedules. The problem is, assuming that a ranking method properly adjusts for opponent strength, it ostensibly should produce no correlation between each team’s ranking and its opponents' average ranking. In fact, we might expect the opposite result because of the two “strength of schedule” games each season—Last year’s 1st place teams play other 1st place teams, and so on.
In 2011 FO’s “DVOA” method correlated with opponent strength at -0.66, which is considerable. Here at ANS, Generic Win Probability correlated with Average Opponent GWP at -0.60 this season. FO notes that in other years the correlation isn’t nearly as strong, but there is an apparent tendency for negative correlations for most seasons.
This phenomenon was first pointed out to me a couple years back by a reader, and I too thought it was either a) randomness, or b) a flaw with my methodology. But I soon realized this is exactly what we should expect given the NFL’s scheduling rules. It’s neither luck nor a flaw. In fact, it's a sign the method is doing something right.
Consider a fictional four-team football league. Presume we have a perfect team ranking system that can peer omnisciently into each team’s soul to know its True Winning Probability (TWP). The Sharks, Knights, River Dogs, and Jack Rabbits each have a TWP of 0.75, 0.60, 0.40, and 0.25. (Notice the TWPs average to 0.50, as they would have to.)