Last season, we revealed our Markov model of a football drive outlined here.
All of the same math and logic still apply, but this offseason we updated the model to reflect some previous weaknesses.
Check out the Markov Calculator Tool and model results here.
First, we used an additional seven years of play-by-play data to determine transition probabilities. This will allow us to add new states to the model and ensure high enough frequencies and sample sizes. Logically, we then increased the number of states (from around 350 to just over 1,000). The main difference here is how we grouped states. In the previous model, all yard lines were grouped in five-yard increments as well as distances-to-go. That means that 3rd-and-5 from the 24-yard line was grouped in with 3rd-and-1 from the 20-yard line. This is obviously the biggest gap in terms of grouping as 3rd-and-1 is significantly different from 3rd-and-5. In the modified version, all distances-to-go from 1-10 have their own state. 11-15, 16-20 and 21+ yards-to-go are still grouped together as there is not a significant difference in those situations. Thus, instead of only five potential distance-to-go values, we now have thirteen. Yard line increments were also decreased from five yards to three yards, with special treatment around both goal lines.
Last, we removed data from the end of either half as these situations affect team decision-making based on the clock. As a result, we were able to remove "End Half/Game" as a drive-ending state, giving us only eight Absorbing states: Touchdown, Field Goal, Field Goal Miss, Safety, Fumble, Interception, Turnover-on-Downs, Punt.
At the top you can see the graph of absorption probabilities on 1st-and-10/Goal which looks nearly identical to the previous model. This is because the general probabilities will fall in line with one another, while the biggest differences will be in specific situations. One comment relating to the graph is look at the fumble and interception probabilities -- they actually decrease almost monotonically as you approach the opponent's end zone, suggesting teams may do a better job protecting the ball in high-leverage (read: scoring) situations, more on this here (Do Teams Protect the Ball as They Approach The Goal Line?) .
Similarly, we can look at average absorption probabilities on 4th Down based on yards from opponent's end zone compared to the previous model:
Again, this looks nearly identical, although a few inconsistencies seemed to be smoothed out, especially in field goal probability around the 5-yard line. Both graphs pass the sniff test and behave how you would expect them too. Keep in mind that all 4th-down distances-to-go are grouped together in the second example.
Let's look at the specific situation we mentioned before: 3rd-and-5 from the 24-yard line vs. 3rd-and-1 from the 20-yard line. In the previous model, these two states were grouped together, which obviously presents problems in terms of transition probabilities and absorption probabilities.
Blue and red bars represent the new model's probabilities on 3rd-and-5 from the 24 and 3rd-and-1 from the 20 respectively. The biggest differences are in the scoring opportunities -- touchdowns and field goals. Teams are much more likely to convert on 3rd-and-1 than they are on 3rd-and-5, and thus, there is a huge increase in touchdown probability (and resulting decrease in field goal probability). The old model correlates higher with the 3rd-and-5 results since it also includes 3rd-and-2 through 3rd-and-4 (which are each significantly harder to convert than 3rd-and-1).
Similarly, we can look at expected plays remaining and expected points.
Read More: Modified Expected Points Models - Recursion & Regression
Keith Goldner is the creator of Drive-By Football, and Chief Analyst at numberFire.com - The leading fantasy sports analytics platform. Follow him on twitter @drivebyfootball or check out numberFire on Facebook.
Great stuff as always. Any chance we can get our mitts on the data beneath the graphs? For instance, in the first two charts it would be cool to expand the <0.1 section of the Y-axis to see the interplay of the rarer events.
For sure, check out the Markov calculator on Drive-By Football: https://docs.google.com/spreadsheet/ccc?key=0Ag6b9q23rVNDdGUyQU0xVTJiUnRqbXNFV3pWVjZmaEE&authkey=CNWl2cUF&hl=en&authkey=CNWl2cUF#gid=0
Google doc has all the results.