What I'm Working On

It's been almost 6 years since I introduced the win probability model. It's been useful, to say the least. But it's also been a prisoner of the decisions I made back in 2008, long before I realized just how much it could help analyze the game. Imagine a building that serves its purpose adequately, but came to be as the result of many unplanned additions and modifications. That's essentially the current WP model, an ungainly algorithm with layers upon layers of features added on top of the original fit. It works, but it's more complicated than it needs to be, which makes upkeep a big problem.

Despite last season's improvements, it's long past time for an overhaul. Adding the new overtime rules, team strength adjustments, and coin flip considerations were big steps forward, but ultimately they were just more additions to the house.

The problem is that I'm invested in an architecture that wasn't planned to be used as a decision analysis tool. It must have been in 2007 when I recall some tv announcer say that Brian Billick was 500-1 (or whatever) when the Ravens had a lead of 14 points or more. I immediately thought, isn't that due more to Chris McAllister than Brian Billick? And, by the way, what is the chance a team will win given a certain lead and time remaining? When can I relax when my home team is up by 10 points? 13 points? 17 points?

That was the only purpose behind the original model. It didn't need a lot of precision or features. But soon I realized that if it were improved sufficiently, it could be much more. So I added field position. And then I added better statistical smoothing. And then I added down and distance. Then I added more and more features, but they were always modifications and overlays to the underlying model, all the while being tied to decisions I made years ago when I just wanted to satisfy my curiosity.

So I'm creating an all new model. Here's what it will include:

 -All the features of the previous model, such as the new overtime and team-strength adjustments
 -Better accuracy thanks to more data and more recent, more relevant data
 -Better and more efficient modeling methods (and a much better modeler...I learned a lot over the past several years, and what once took weeks now only takes days.)
 -A blend of modeling methods depending which mix most accurately predict out-of-sample game situations
 -Better precision, with estimates with more significant figures
 -A finer end-game model for close games
 -Similarly, better estimates for the end of the 2nd quarter, when the halftime break forces teams to alter their approach
 -Standard error estimates
 -Better 2-point conversion logic

One example of a regrettable legacy decision is that I chose not to use any significant figures beyond 2. In other words, the underlying model produces WP estimates such as 0.45 rather than 0.448 or 0.452. I knew that the sample error was going to be much larger than .001, so I didn't want to mislead anyone by making hyper-precise estimates, so I rounded out the original computations. But I now realize that was a mistake.

For example, let's say a team that's down by 3 in the 3rd quarter has a 1st and 10 at their own 40, which gives them a true 0.446 WP. A RB carries the ball for a 6 yard gain, setting up a 2nd and 4 at the 46. We know (with as much confidence we could have) that's an improvement in WP for the offense. Now say the offense's WP is truly 0.454.  The old model would say that the play began and ended at 0.45 WP, and the play's WP was therefore zero. So although it's true that in absolute terms the model error is going to be much bigger than .001, we also know that relative error between plays is going to be much smaller. Besides, we are always free to round back to however few significant digits we prefer.

The best thing about the new approach will be that all the features noted above will be part of the plan from the outset. But this takes a lot of time. Rather than applying one model with a single set of parameters and assumptions across the entire game and blindly accepting the results, I'm scrutinizing the output at every minute of the game, ensuring I've got the best fit.

Don't be alarmed if you're a modeling purist and your overfitting-detector is going off. I am merely recognizing that different 'regions' of the game are better modeled with different approaches. For example, given that a team is down by 3 points throughout the first 3 quarters or so, a stronger smoothing parameter best captures an unbiased estimate of a team's chance of winning. But in the end-game, when there is high leverage and hurry-up play, a strong smoothing parameter loses important signal. The data is much thinner when a team is trailing by 5 rather than by 3, so different parameters are needed. And in some cases entirely different techniques might be required. Another consideration is how dense the data is. Being ahead or behind by 15 points is relatively rare, and a modeling approach that works very well for a score difference of 3 points might produce very noisy estimates for 15 points.

So far, I've used LOESS, generalized additive models, logistic regression, linear regression, Markov models, dynamic programming, splined regression, and simple Gaussian models in some way, shape or form to help build the model.

Ultimately, my goal is to have a tool that needs no excuses. I want to be able to say, there's nothing the model doesn't consider and consider well. (Hey, what about weather!) Ok, maybe I'll get to weather next year.

  • Spread The Love
  • Digg This Post
  • Tweet This Post
  • Stumble This Post
  • Submit This Post To Delicious
  • Submit This Post To Reddit
  • Submit This Post To Mixx

11 Responses to “What I'm Working On”

  1. BIP says:

    Sounds awesome, Looking forward to it!

  2. Anonymous says:

    Sounds great to me. If you wanted to build in the ability for the user to choose different starting team strengths that would be something I'd be interested in for sure. Good luck and enjoy!

  3. Hooper206Vintners says:

    Brian, I think one of the easiest new features you could add on, to help look at the accuracy of the model and provide your readers with more information, would be an injury-adjusted efficiency model. The game probabilities that would result would be an interesting set of data to test the strengths and weaknesses of the model. And I would guess it would take you about 5 minutes extra per week.

  4. Anonymous says:

    Hi Brian,
    I agree with the anonymous comment above that the efficiency of the two team participants would be a useful input to the WP model for a given game. I know this opens up a whole new can of worms, but perhaps something to consider.
    Thanks, Matt

  5. Nathan Lazarus says:

    I really like that the new model accounts for getting the ball at half and has an option to make it account for team strengths. I like the idea to be able to input team strengths, rather than just using yours, and just have yours as the default. I'd play around with that for quite some time mid-game (this would be especially helpful in early weeks of the season). I'd love an article on the details of the methods you're using too. Thanks for updating the sig-figs, that's gonna be a lot better, as 98.50 is a lot different from 99.99.

    One big request, can you make the model account for score differentials up to 35?

  6. Dave says:

    I don't know what methods you are using to judge fit, but I'm thinking you will want to use random (or contiguous) block cross-validation, and use AUC as your metric when tuning parameters.

    The reason I suggest random blocks is because the data is not independent. You could just could simply treat each game as a block, and randomly divide your data into 5 to 10 buckets of complete games for cross validation.

    Using AUC will reward more extreme (like .01 or .99) correct predictions than will a simple accuracy metric.

    I would be interested to hear how you go about your parameter tuning and calibration.

  7. Unknown says:

    As long as we're giving Brian our Christmas lists, here's mine (which is in the form of a question really): do you ever see a time when you'll post your play-by-play data with an EP and WP column (similar to how Pro Football Reference does)? I can think of a few reasons as to why you wouldn't, the first being that it would most likely be a huge hassle.

    Admittedly, I'm firmly in the "Things Interesting to Fans" branch of sports analytics, and would love to have that data more easily available. For example, doing any kind of WP analysis of special teams is pretty cumbersome since the data has to be inputed for each play from the WP calculator (I know, cry me a river).

    Also - the more readily and easily available these stats are, the more "standard" they become. I'm thinking someday it would be cool if the model you designed is the industry standard. When a writer throws out a WP stat, there isn't this kind of wonderment on the part of the reader as to where it came from, or whose model it is, etc.

    Sorry for the long post, thanks again for a great site and for the great work you're doing.

  8. Elliot says:

    I don't intend to be an internet troll, and I hope this doesn't come off that way. As a software and machine learning person, two detectors went off for me: 1. http://en.m.wikipedia.org/wiki/Second-system_effect

    2. Will you use a SEPARATE test set that ISN'T the same as the out-of-sample set you use to tune the blend between models and such?

    Cheers and good luck.

  9. James says:

    Brian, I have a rather specific question about clock usage at the end of games: If there's 1:30 left and the offense is trying to run out the clock in a close game, a 5-yard gain when the runner stays in bounds is more valuable than a 5-yard gain when the runner goes out of bounds. Does WPA accurately capture this information?

    It would be hard to do so and mostly only applies to this one specific situation so I doubt you've put in the effort to make it work, but it's something to consider. I don't know if it'd be worth it al all, but the potential ways I see it you'd either have to parse out from the play-by-play if the runner was in or out, or if you'd have to base it on the clock when the next play starts (which adds further complications of timeouts, if the team runs down the playclock, and delays processing the previous play).

  10. Anonymous says:

    Brian, one request if you are redoing all your numbers. Would it be possible to break out RB rushing/receiving stats as separate entities?

  11. WH says:

    Great site. Great info. Looking forward to the updates.

Leave a Reply

Note: Only a member of this blog may post a comment.