Midway through the 2006 regular season, I created an alternate game-by-game winner prediction model. It used the same stats, such as yards per rush, yards per pass attempt, turnovers, and home field advantage, but it used them in a different way.
I still used logit regression to compute the outcome probability of each game, but I experimented with different forms of each independent variable to see if I could improve the model's fit. I tried exponential and logarithmic versions of each variable, but predictive power was not improved.
I finally stumbled on an idea that produced a version of the model at least as predicitive as my original. I wondered if I could model the effect of a very strong running offense against a very weak run defense (and for passing, and vice versa). I theorized that if I could mathematically represent such an interaction, it might fit reality better than considering each team's efficiency stat alone. After all, this how it works in actual games--one team's offense interacts with its opponents defense. They're not performing independently in front of judges or for time.
Instead of using each team's yards per pass attemt/run, etc., I created variables that captured the interaction of an offense vs. a defense which I called PASSFACTOR AND RUNFACTOR. When Team A plays Team B, this is represented mathematically by:
APASSFACTOR = AOPASS * BDPASS (Team A's off pass eff x Team B's def pass eff)
ARUNFACTOR = AORUN * BDRUN (Team A's off run eff x Team B's def run eff)
BPASSFACTOR AND BRUNFACTOR are computed in the same way.
In this way, if a team with a great pass offense plays a team with a poor pass defense, the mismatch will be captured because PASSFACTOR will be very high for the great passing team.
The net turnovers and home field advantage variables remain the same. However, with a better database, it would be interesting to use the same technique with team A's giveaways vs. team B's takeaways and vice versa, or even going deeper by discerning fumbles and interceptions as separate variables. The random components of turnover stats may become too strong if we divide them up like that.
The new "matchup" model was nearly as predictive as the original, correctly predicting the outcome of only one fewer the 2005 games as well as the original (74.6% correct). All variables were significant. This was no breakthrough, obviously, but it is another tool to understand the game. And by adding more seasons of data, perhaps the model will improve.
Week by week, both models produced very similar probabilities, often only differing within .03 or so.
- Home
- Unlabelled
- An Alternate Model
An Alternate Model
By
Brian Burke
published on 3/07/2007
Subscribe to:
Post Comments (Atom)
I’ve happened upon your site a couple of times and look forward to digging into it further. So far I have not found any data about the percentage of NFL games in which the first scoring play occurs in the first minute, second minute, etc. (I.e., 1M = 14:59 to 14:00 of the 1st quarter, M2 = 13:59 to 13:00, etc.).
Any idea where I might find that, or is it somewhere on your site that I haven’t found yet?
To determine which team in a H2H matchup will win SU, what are the most important statistics to examine here? If you were to pick 5 , which would be the most valuable or 3 most valuable?