tag:blogger.com,1999:blog-38600807.post5864898990668365829..comments2022-12-13T07:34:58.255-05:00Comments on Advanced Football Analytics (formerly Advanced NFL Stats): A Markov Model of FootballUnknownnoreply@blogger.comBlogger10125tag:blogger.com,1999:blog-38600807.post-89250393174607103132011-09-26T11:34:30.893-04:002011-09-26T11:34:30.893-04:00Ed-
I have wanted to apply this model to teams fo...Ed-<br /><br />I have wanted to apply this model to teams for awhile. The problem, as you mentioned, is sample size. I would probably follow a similar solution to Sarah Rudd from On Footy (who applied a similar model to Soccer) where for the rare situations, you apply the league average. Another concern is that you could not use more than a years worth of data to apply the model to a team since there is within team variation from year to year.<br /><br />Andrew - <br /><br />Definitely an interesting thing to consider. A form of team-adjusted markov model would definitely be interesting. It would require a lot of research to see the most accurate way to adjust the model, however.Keith Goldnerhttps://www.blogger.com/profile/16510947295485321744noreply@blogger.comtag:blogger.com,1999:blog-38600807.post-92038051146421124362011-09-22T13:32:44.563-04:002011-09-22T13:32:44.563-04:00In 2010, the standard deviation among teams to con...In 2010, the standard deviation among teams to convert a set of downs was about 5%. In a very stupid model in which you get three uncorrelated chances to convert a set of downs, this implies a 4% variation among teams in the probability of obtaining a first down on a single play.<br /><br />For a binomial process it takes 120 plays to start seeing a 4% difference with any meaningful significance. Since the average team runs about 1000 plays a season, you can only bin-up about 8 game-states if you divide up your data by team and you want to see meaningful differences.<br /><br />Instead, you'd be much better off to aggregate the data over all teams for making the base model. Then find a some way to make adjustments to the model's probabilities using one or two parameters per team, fit from the whole data for the team. (For instance, a "forward-slosh parameter" per team that takes some fraction--depending on the value of team's parameter--of each state's transition probability and moves it to the probability to transition to the "next best" game state. Definition of "next best" may be tricky.)<br /><br />The various numbers I give above--derived from a patently stupid model of down sets--are not meant to be authoritative, but merely to illustrate rough sizes of probability effects that should be considered important and meaningful, and to get some idea of what statistics might be supportable on a per-team basis over a season.Andrew Folandhttp://nuclearmangos.blogspot.comnoreply@blogger.comtag:blogger.com,1999:blog-38600807.post-49324823797937478972011-09-22T12:47:15.217-04:002011-09-22T12:47:15.217-04:00Following up on Jeff's comment, I think it wou...Following up on Jeff's comment, I think it would be interesting to apply this model to the team level. There will be a data sparsity issue, and it might not be possible to get much data on 3rd and 5 from the 36 against Tampa Bay. But there may be enough data on Sean Payton's offense to get a model specific to the Saints. Thoughts, Keith?<br /><br />People in chemistry and biophysics attempt to make Markov models for all kinds of phenomena. For instance, they might take terabytes of results from a molecular simulation to make a Markov model of how a protein folds. The Markov model is simple and can predict things on longer time scales. In football, the play logs are like the results of molecular simulation. While much of this work is ad hoc, they may have developed some mathematical ideas that could be applied to football.<br /><br />http://thepowerrank.com/Ed Fenghttp://thepowerrank.com/noreply@blogger.comtag:blogger.com,1999:blog-38600807.post-81319612523217824752011-09-22T00:00:49.366-04:002011-09-22T00:00:49.366-04:00What about getting even morre technical, and track...What about getting even morre technical, and tracking team tendencies in certain situations and the outcome? Like, on 3rd and 5 from their 36 to the opponents 35 yard line, the Saints go with 4 WR's and usually pass to the Right side of the field 58% of the time. In this same scenario, their opponent, the Bucs, run a 2 sided defensive line stunt and blitz their inside backer. We could then compare the outcomes of each of these scenarios to get a more realistic probability.Jeff Andertonhttp://www.notsure.comnoreply@blogger.comtag:blogger.com,1999:blog-38600807.post-5460897532259225542011-09-21T12:46:40.551-04:002011-09-21T12:46:40.551-04:00Brian -
Yes, there are definitely limitations wh...Brian - <br /><br />Yes, there are definitely limitations when it comes to momentum and other immeasurables. As a sports fan and athlete, I believe in the power of momentum. As a statistician, I don't. I'm inclined to believe that we notice momentum as a product of the current events, rather than momentum having predictive power in terms of future outcomes. I've read a few studies about it, but I'm always interested in attempts to measure the immeasurable.<br /><br />DGold - <br /><br />I agree with you completely. It is definitely a stretch in those cases since we do not take into account score and time left. But, like you said, sample size is the biggest issue to over come even with years and years of data. A simpler way would be to calculate a team's probability of converting a 4th-and-20+ based on all the relevant attempts and then use the Markov probabilities for the following 1st-down in conjunction with the 4th-down conversion probability.Keith Goldnerhttps://www.blogger.com/profile/16510947295485321744noreply@blogger.comtag:blogger.com,1999:blog-38600807.post-72674339007870029562011-09-20T20:53:26.315-04:002011-09-20T20:53:26.315-04:00Great model and great analysis. The problem is tha...Great model and great analysis. The problem is that your model, like many expected value models, is not robust when it comes to unexpected outlying events and falls apart apart a bit when it comes to extreme cases.<br /><br />It strains credulity to say that the Eagles' drive scores only 1 in 175 times. The problem for that strange drive is that your model does not take into account time left in the half or the score. A proper analysis of the particular McNabb play would exclude any drive in which punting was a reasonable option. This would involve not only removing all actual punts, but also the successful drives where the offense's situation was no so dire as to completely preclude kicking the ball away.<br /><br />We would probably be left with only samples from last in the fourth quarter and thus a rather small sample size. The sample size might even be small enough that any analysis wouldn't be meaningful at all.<br /><br />What I'm getting at here is what we colloquially call four-down territory. The outcomes of drives when a team has the ball between their own 20- and 40-yard lines on 4th-and-20+ is going to be hugely different where they simply need to score now to avoid losing.DGoldhttps://www.blogger.com/profile/02036617445383551621noreply@blogger.comtag:blogger.com,1999:blog-38600807.post-84243316219929602622011-09-20T18:30:55.054-04:002011-09-20T18:30:55.054-04:00Keith,
A very good article and description of the...Keith,<br /><br />A very good article and description of the model!<br /><br />To what extent do you actually believe football follows a Markov process? The foundation of this model (and many others, to be sure) is the idea that momentum doesn't exist. Though they probably don't articulate it this way, I think this is an objection to these types of metrics shared by many "old school football guys" (read: Polian). <br /><br />I think where the community should go from here is to try to test that assumption. It's been almost 10 years since I've studied this stuff, but there must be some way of using this data to test the Markov assumption.<br /><br />I'm interested in your thoughts on this.<br /><br />Thanks for the good article!<br /><br />BrianBrian Andersonnoreply@blogger.comtag:blogger.com,1999:blog-38600807.post-39054843660198835962011-09-20T12:32:43.286-04:002011-09-20T12:32:43.286-04:00James -
For times when there were multiple absor...James - <br /><br />For times when there were multiple absorbing states, I implemented a hierarchy of preference. So in the example of a FG as time expires, that was listed as a FG. Similarly, like you mentioned with the Rams fumble that was returned for a TD, that was listed as a fumble. In the model, since the defense does not possess the ball, they cannot score (the model terminates in an absorbing state as soon as the fumble is recovered).<br /><br />Andrew - <br /><br />Very interesting, and definitely something I will look into. I have a few other tweaks I would love to make to the model if time would permit it, but I think you are right in that the greatest variability would be for the lowest yards-to-go.Keith Goldnerhttps://www.blogger.com/profile/16510947295485321744noreply@blogger.comtag:blogger.com,1999:blog-38600807.post-41445631586822500922011-09-20T12:08:00.485-04:002011-09-20T12:08:00.485-04:00I understand you group by 5-yard increments in ord...I understand you group by 5-yard increments in order to have sufficient data in every bin, but it seems very likely to introduce significant errors. The error induced is going to be proportional to the probability variation within a state, and the number of times you access a state; and these are both probably largest for relatively low number of yards to go states.<br /><br />Instead, I suggest you try to fit a set of parameterized functional forms to at least the low-number-of-yards-to-go data, so that you can stretch the data you do have better and make better informational use of it. You can then substitute the functional form's answer instead of the observed rates into finer-grained states.Andrew Folandhttp://nuclearmangos.blogspot.comnoreply@blogger.comtag:blogger.com,1999:blog-38600807.post-80175346790212309082011-09-20T10:44:44.915-04:002011-09-20T10:44:44.915-04:00This is neat. Do you account for the potential of ...This is neat. Do you account for the potential of multiple absorbing states when you calculate the probabilities? I'm thinking like a last second field goal as time expires, or... actually I guess it's more unlikely than I originally thought.<br /><br />I'm also assuming that touchdown only includes offensive touchdowns - ie when the Giants returned the failed lateral for a TD last night, that the absorption state is just "fumble", not "fumble + TD". Is that right? Otherwise you could have some complicated plays like the Desean Jackson punt return touchdown as time expires (Punt + TD + end of game).Jameshttps://www.blogger.com/profile/01838293735141324662noreply@blogger.com