Weekly game probabilities are available now at the nytimes.com Fifth Down. This week I review how the prediction model works, and how the results are best understood.
- Home
- Unlabelled
- Weekly Game Probabilities
Subscribe to:
Post Comments (Atom)
Weekly game probabilities are available now at the nytimes.com Fifth Down. This week I review how the prediction model works, and how the results are best understood.
To avoid the questions that will undoubtedly come regarding the spread, I can tell you that this equation will convert Brian's probabilities to a very good approximation of the implied line:
points favoured = -8.5*ln( (1/p) -1 )
Where p is the probability of that team winning, 'ln' in the natural logarithm.
So, for instance, that gives about two points advantage to a team with a win probability of 0.56....
I read your full post on the choice of success rate vs. efficiency in creating the new model. I have a tough time understanding why reducing an event to a binary outcome will lead to better predictive accuracy than using a more sensitive analog input (play efficiency). With an infinite sample size, this makes sense; but samples are small in football, and I can't imagine play success rate is the optimal source for predicting the future.
The main issue you had with efficiency was that it did not correlate to wins as highly as PSR did. This seems pretty obvious as to why. A play's success has a massive and direct impact on winning, just as creating a turnover does. However, are either of these events repeatable skills? Certainly turnover creation is not (very much); I haven't seen the evidence that a play "succeeding" is. The only studies I have seen, show that play success rates on third downs (gaining a first down) are more luck than skill, when adjusting for the team's normal offensive efficiency.
Finally, the point in your original article that running efficiency has only a small correlation to winning seems shortsighted. Clearly, teams that are leading will run more often; the more often you run, the less efficient you are likely to be. Hence, winning is causing a drop in run efficiency, rather than run efficiency directly failing to cause winning.
Overall, I get the general idea that using raw efficiency (ignoring game context) is not perfect. But it appears you have gone way too far to the other extreme: concluding that the actual game impact of the play -- not what happened during the play -- has the most predictive value. To me, this is akin to moving from a power rating model that uses strictly margin of victory and ignores win/loss record, to a model that uses only win/loss record.
Hopefully you can silence me by showing the historical predictive performance of your new model vs. the old one.
I don't have to explain WHY run SR success rate correlates with winning better than efficiency. It does. Whether you or I understand why doesn't change the fact that it does. Using words like 'shortsighted' or 'I can't imagine' is pointless and unhelpful. I may not buy into the concept of hydrogen fusion, but that doesn't change the fact the Sun keeps shining.
You should be silent because you have the wrong idea about research and statistics, not because the prediction model will be accurate or not.
If you don't care why something works, shouldn't you just use past turnovers to predict future success? Awesome correlation, after all.
Please read this post. It will answer your question.
Each statistic has been carefully tested to determine both their explanatory and predictive value.
Maybe there should be a 1-year minimum for new readers before they start throwing around misguided comments. Questions, absolutely. Comments like 'shortsighted', no. I have studied and written enough of the sport to fill 10 dissertations, and I can't encapsulate every relevant previous post.
In the future, what the above commenter should have written is "What evidence do you have that runSR is more predictive than run efficiency?", leaving out all the other blathering nonsense. I would have been happy to answer with the link I just provided.
"Predictivity?" I hope that term has been trademarked. It could someday replace out of sample testing. Or not. Other than trademark revenue, why not measure the correlation of past run SR to future win pct. directly?
'Out of sample testing' just rolls off the tongue, doesn't it?
Direct correlations don't have the sample sizes for reliable results. You'd also need separate models for each week of the season. It's very cumbersome.
Ah, one of my favorite times of the site, when an anonymous troll comes along and Brian lays some smack down. Love the site, Brian. Keep up the good work, and keep the trolls @ bay.
Tom--do you have a handwave-y reason for the functional form (ln[(1/p)-1]) of your converter? Other than that it has nonzero slope at p=0.5 and takes opposite values for p and 1-p?
I ask because it approximates with startling (to me) precision the inverse error function for a normal distribution with standard deviation of 14 points. I for one hadn't known there was a decent approximation of such simplicity.
It is simply the inverse of a logistic distribution which I fitted to past NFL results data. It was originally intended to give the probability given the spread, but was easily inverted.
The logistic function fits extremely well to the observed data.
I use a simpler formula to calculate implied spread from win%:
Spread = 34.5 * (pred. win % - 0.500)
This assumes a 34.5 points per win value for the NFL, which I found similarly to have the runs-per-win of ~9-10 is found in baseball.
Zach, that's a good fit up until around 70% wp, at which point it begins to deviate too far from measured values.
Tom--thanks. The logistic function does look pretty Gaussian so I suppose it's no wonder that it looks like the inverse error function.
Your curve-fitting would probably be an interesting contribution for the community site!
I will soon be making a post on the community site regarding some data I have put together such that I believe it shows that there is no such thing as red zone performance, perhaps I will double post with some shorthand formulas for the everyday fan.
Looking forward to that Tom.
By the way, here are the new coefficients of the model: $C['const'] = -0.4619;
$C['ahome'] =.748577;
$C['aopass'] = 0.4467;
$C['aorunSR'] = 0.03565;
$C['adpass'] = -0.576;
$C['adrunSR'] = 0.02749;
$C['aoint'] = -13.026;;
$C['adint'] = 14.24;
$C['aofum'] = -7.209;
$C['apen'] = -1.587;
there was a comment on it being difficult to determine if the game predictions are accurate or not. it seems very simple to do, especially over a few years of games.
take all the 50% games (i.e. between 40 and 60 % win prob) and see if you were 'right'. you should be right about 50% of the time (50% of the time +-1/sqrt(N games)). Likewise, take the 65% or greater games, and see how often you were right.
My apologies, stats was not my strong suit, but am I right in reading that the coefficient for passing efficiency is 15 times stronger than running success rate Brian? If so, is passing really weighted that much heavier or does that mean something else?
Brian, have you thought of compiling a chart that shows team GWP over time. Something like this http://imgur.com/MxQDs It would be interesting to see what teams have been the most dominant, and what team has had the highest single GWP.
Brian!
The match up pages still have YPC as a measure of efficiency!
Brendan, Brian can correct me if I'm wrong, but I'm guessing he doesn't standardize his variables, so you can't compare the coefficients in that manner. Passing efficiency might be as YPA, while Run SR is expressed as a percentage. Completely different scales.
It seems like the best way to test these probabilities against Vegas is by making hypothetical bets against the Vegas moneyline (i.e. bets without a point spread). The moneyline gives an implied prediction of a team's probability of winning. For example, if a team is +150, they have an implied 40% chance of winning (100/(100+150))
I'm planning on tracking Brian's predictions against the moneyline in two ways. The first will be by betting a single unit on each game. If Brian's system says Seattle is a 59% favorite, and the moneyline says Seattle is 40% to win, I'll bet a unit on Seattle. This method is OK, but it doesn't take into account the magnitude of the difference between Brian's predictions and the moneyline predictions, like a reasonable sports bettor would.
The second way will vary the bet size based on the difference between Brians's system and the moneyline. If the difference is 0-2 percentage points, there will be no bet. If the difference is 2-6 percentage points, the bet is 1 unit. If the difference is 6-10 percentage points, the bet is 2 units. 10-20 means a 3 unit bet, 20-30 means a 4 unit bet, and 30+ means a 5 unit bet. The second method is a more realistic way someone might bet if they knew Brian's probabilities were accurate.
I'll let you know how it goes.
"Direct correlations don't have the sample sizes for reliable results. You'd also need separate models for each week of the season. It's very cumbersome. "
Most successful predictive models are actually quite cumbersome to test. Hasn't stopped people historically, though.
Hard to imagine creating a model with a stated goal of prediction and having zero interest in how well is has performed in...well, prediction. This might gain you 10 imaginary PhDs (sure it's not merely 7 or 8?), but would not exactly pass muster in real-life industries such as financial markets and weather forecasting -- where results actually matter.
Nice try, anonymous loser. Go back to your World of Warcraft game.
The point I made was about trying direct week to week correlations, not that I wasn't interested in the accuracy.
You are confusing 2 different points. The other point is that like most good researchers, I wait for independent scoring, and (try) not to trumpet self-serving numbers.
But hey, don't let that stop you from making cowardly smart ass comments. And if you'd like to get off your fat ass and do your own week-to-week model, and publish its results, I'll be the first to post them.
That's what I thought.