I, for one, welcome our new statistical overlords.
ESPN has a talented new analytics team, and their first foray into football is their Total QB Rating. It seems the first thing anyone does when they get into advanced football stats is to create their own QB rating system. The QBR is a major improvement over the NFL's traditional passer rating, and there are a lot of things I like about it, but it's not perfect. I'll try to summarize my understanding of the stat, and then I'll list the things I like about it and the things I don't like so much. As we say in the fighter pilot business--the goods and others.
According to ESPN's own explanation, the stat is based on three primary concepts--Expected Points, Win Probability, and division of credit. As I understand it, QBR begins with a QB's Expected Points Added for each play in which he was directly involved, including both pass plays and runs. It modifies each play's EPA value according to a clutch factor, which is based on Win Probability (WP). Here, I use something similar known as Leverage Index (LI). LI is the ratio of the potential swing in WP for a play compared to the average play's potential swing in WP. For example, an LI of 3 means that a play is 3 times more critical to a game's outcome than the typical NFL play. (You can find any play's LI on the interactive WP graphs here by hovering your cursor over the graph. I still consider it a 'beta' stat because I haven't settled on a final, single definition of potential success and failure for every play.)
QBR also divides credit for plays according to ESPN's own analysis. For example, they divide credit for a passing play between passer and receiver according to the Yards After Catch (YAC) and other factors. This is analogous to the Air Yards (AY) concept I introduced in 2007, which long-time readers here might be familiar with. QBR appears to go beyond what AY does as it apportions credit for things like pass interference calls, dropped balls, and passes defended. I'm not sure if they are charting every pass and individually and assigning credit pass-by-pass, or if they used their analysis to assign a split value for each class of play. For example, every screen pass might be split 10/90 for the QB and receiver, or every pass defended is split 40/60 (or whatever the actual figures are supposed to be).
Lastly, QBR is normalized on a 100-point scale, where 50 is average and a Pro Bowl caliber season is between 65 and 70. An excellent individual game might be as high as 90. Ultimately, the units of QBR are....nothing? Passitons? QB-trinos? I'm teasing, but I'll explain why this is important below.
The Goods
1. It includes sacks, running, fumbles and all the other important things that the traditional NFL passer rating doesn't.
2. It doesn't double count anything, as the NFL passer rating does with completions.
3. It is based primarily on EPA, which accounts for down, distance, and field position.
4. It is also based on WP, which considers time and score.
5. It's a rate stat instead of a cumulative stat.
The Others
1. It is proprietary.
2. It is unit-less.
3. It is an amalgamation of other stats.
I'll explain my criticisms below:
Proprietary
I'm not a big fan of proprietary statistics. I understand there are good reasons to protect an intellectual investment, but I'd like to see a lot more detail about their methodology. I've published all my models in full except for the WP model. And even then I divulged as much as possible about how I created the WP model, and created a publicly available calculator. I believe openness improves the models and their credibility. I'd encourage ESPN to publish their EP model. It's not an easy thing to create, especially for 2nd and 3rd downs.
There are details I'd be curious about, such as how they count subsequent kickoffs after a score, or how they estimate the value of a 4th down in no-man's land, where you can't predict whether a team will go for it, punt, or try a long FG. How do they count the last play or drive of a half, when an offense might make halfhearted attempts to score or throw a bomb that is as likely to be intercepted as fall incomplete?
Making it purely proprietary also guarantees it won't be used by any other major outlet. Fox, CBS, NBC, and the NFL itself will probably ignore it. It won't replace the traditional passer rating, despite being a major improvement.
Unit-less
Units are valuable because they give stats meaning. If I told you Kurt Warner was a 63, so he should get into the Hall of Fame, you'd say 'huh?' But if I said Kurt Warner was worth 3.7 WPA per season (making an otherwise average team win 11.7 games on average), you'd say 'I get it.'
Stats with units are much more useful in analysis than unit-less stats. I use WPA (whose units are wins) and EPA (whose units are points) for game-theory analysis of play calling and for decision analysis for 4th downs, onside kicks, etc. In the end, I'd like my analysis to be useful more than anything. Facing a 4th and 8 from the opponent's 36, in playoff-format overtime, a coach would want to know whether to run, pass, kick or punt. WPA says punt! QBR says Matt Ryan is a 62! DVOA says Ryan's a 14%! You get the idea.
When devising a stat or a measure of any sort, it's important to first ask what is its purpose. In this case, QBR's purpose is solely to rank NFL QBs. It's purpose isn't to do all those other things, so it's ok that it doesn't. I only raise this point to explain why I prefer more useful numbers with meaningful units.
Also, a stat with meaningful units acts as a currency of the entire sport. WPA or EPA can compare the impact of performances of RBs, QBs, WRs, and even kickers. If ESPN creates a RB stat, which I expect they would at some point, you wouldn't be able to compare a QB 65 with a RB 65. But a QB's 25.5 EPA can be compared directly with a RB's 10.2 EPA. We could compare the impact of someone like Devin Hester to someone like Chris Johnson. Without comparisons like these, we wouldn't discover things like how only the very best RBs actually provide a positive impact.
Amalgamation
QBR is a mix of other measures. At its core, it basically combines EPA, WPA, and Air Yards. Suppose quarterback Farley and quarterback Andrews both have a QBR of 60. Say Farley got his 60 because of very high EPA but low WPA in high leverage situations (like Philip Rivers last year), and Andrews got his 60 QBR with lower EPA but high WPA in high leverage situations (like Matt Ryan last year).
I'd prefer to look at a line of various stats--such as EPA / WPA / AY / Success Rate (SR). In the example above the two QBs' season stats might look like this:
Farley +100 EPA / +1.2 WPA / 5.5 AY / 50.1% SR
Andrews +85 EPA / +3.2 WPA / 4.2 AY / 45.1% SR
This is far richer information. We can see who was more consistent, who was more productive overall, who relied on his receivers' YAC, and who was luckier (or 'clutch' if you prefer). We can get an idea of who is more likely to repeat his above average performance into the near future. A single number can't tell us anything like that. This is not an indictment of QBR, just my own preference.
If I were to settle on a single number, it would be WPA/LI, as Tom Tango has suggested. One day I'll get around to settling on a good definition of how to define typical football 'successes' and 'failures' in terms of a swing in WPA. It's worth several posts and a healthy discussion with the community. Until then, my LI stat is just a work in progress.
Ultimately, I think QBR is interesting and is a thousand times better than the traditional passer rating. But it's not very useful outside of ranking QBs from top to bottom. It might be fun fodder for discussion on TV or Internet message boards, but it's hard to see how it's useful beyond that.
I congratulate the guys in Bristol for putting this together. It took years to develop the EP and WP models I use, and it was no small task. It looks like they have the tools for a very sound analytics program going forward. I also like that they chose the concepts that Advanced NFL Stats has championed. There are other models and approaches out there, and I take this as a vote of confidence for the tools developed here.
Full disclosure: I had several exchanges with one of the ESPN analysts over the past 12 months, answering questions and explaining various aspects of the Advanced NFL Stats models.
Subscribe to:
Post Comments (Atom)
Who are these QBs Farley and Andrews. I never heard of them. Devin hester and Chris Johnson I've heard of. As a matter of fact they were my best friends in high school.
Borat believes you picked these names Farley and Andrews out of thin air.
Although I understand why they're not concerned about this, the very top of my The Others list would be: inability to get historical perspective.
WPA, EPA, etc could (with enough resources) probably get back to the mid-80s eventually. Even with all the king's horses and all the king's men QBR is never going to tell us anything about Roger Staubach, much less Bart Starr. And some of us are interested in moldy old stuff like that :)
The problem I've had with WPA has been that it ignores the notion of marginal value.
For instance, let's say the Packers are up big in the 3rd quarter of a game, and they extend their lead to 28 with a TD. My understanding is that the WPA of this play would be relatively low because of the game context.
Next, suppose that their opponent stages a historic comeback and now leads by 1 late in the 4th quarter. Then the Packers again score a TD on the last play of the game to pull off the victory. My understanding is that this play would have a very large WPA measurement.
But wait - neither TD was more important to final score: both were needed to achieve victory. They were of equal marginal value. That is, if you remove either one, they lose. The specific game situation at the time of each play is irrelevant to its importance in the game's outcome.
(An analogous situation would be an election in which a candidate wins by a count of 501 to 500 votes. Was the 501st person to vote for the winning candidate more important than the 500th simply because of the chronological order in which they voted? Of course not; all were needed. Furthermore, whether you are discussing football or elections, there is decreasing marginal value of extra points or votes above the opponents score, but that decreases the value of each individual score or voter, again regardless of chronological order)
Does your suggestion of dividing WPA by LI adjust for this? If not, do you have any thoughts on WPA vs marginal value?
I believe EPA accounts for marginal value, it takes all plays into account without looking at the game situation.
Welcome back Brian. I tought you gave up. I am relieved now that you didn´t :-)
I think individual rankings in Football can´t compare players of different teams. No matter how hard it´s been tried to come up with a good model, it´s never going to work, unless some REAL outstanding performances over a long period show up, like Jaamal Charles or Kurt Warner (with different teams!!).
But as team rankings, those WPA/EPA-Concepts are very good and useful.
If Tom Brady has a high Y/PP or QB-Rating or WPA/EPA or Passer-Rating; all this tells me that NE´s offense is clicking from A to Z. But would Tom Brady post this good numbers with for example the Raiders? I highly doubt it. Brady is good in NE, so was Cassel (just look how he is doing in KC). Young was "bad" with TB, but "great" with the 49ers. So was Montana, Garcia and Grbac.
Anyway, looking forward all the great articles coming...
Greetings from Germany, Karl
Andrew, I agree with you there on EPA. Brian mentioned that the new QBR utilizes something like WPA, and I'm questioning its usefulness in any rating system.
I like the WP concept. It's fun to follow on game days and provides a nice graphical summary of game flow, but I think it is a questionable gauge of value for individual performances.
probablepicks - I think the WPA/LI does take into account marginal values as you've described. As I understand it, WPA/LI is essentially a measure of how well you've done in a situation compared against the range of outcomes that were possible from that situation. So if you throw a TD to go from .98 to .99 WP, that would count as much as throwing a last second TD to go from .50 to 1.00 WP because in both cases you've performed the best action you possibly could have.
It is very similar to EPA, but I can imagine there are times when a play will give -WPA and +EPA (I'm think late in a game on a pass play that doesn't end up with the receiver OOB, or that type of thing) and that's what WPA/LI will capture.
>>If I were to settle on a single number, it would be WPA/LI, as Tom Tango has suggested.
Why?
Why not just WPA?
Why add in Leverage index?
WPA alone shows how much the player contributed.
How would LI be factored in? If the player had been unsuccessful, look at that EPA vs the successful one? Does this mean that a few crucial plays (that already have a positive EPA) are even more highly rated than more smaller (in terms of EPA gain) plays?
Brandon in New York
This is my understanding (mostly from baseball): WPA/LI basically re-weights the value of events for the situation. While EPA adjusts for the down and yards, WPA/LI also adjusts for the game state (basically everything). WPA already does this, but the huge swings in WPA don't give that clear of a picture of how a QB actually played (like Ryan vs. Phillips last year). The value of a play or a few plays distorts the value of the whole game. WPA/LI gives every play equal weight.
WPA/LI is going to be very close to EPA, but it would capture some interesting things that EPA won't.
For example, it would measure the clock management aspects of the game. In the end-game final drive, an incomplete pass that stops the clock is going to be much better than a 3 yard gain that keeps the clock ticking. EPA would favor the 3-yd gain.
Same on the other side of the ledger. A team ahead may prefer to take a sack and keep the clock running rather than throw an incompletion.
Depending on how you define LI, interceptions would be penalized harsher if they are especially ill-timed.
Didn't ESPN's new QB rating have Detroit's QB who went 0-10 in 2008 rated better than Big Ben who went 12-4 and produced a game winning drive in the SB ??
Are you saying the Detroit QB could have done the same in Pittsburg ??
This ESPN stat is clearly using your intellectual property. You are being modest. They should clearly credit you and others who did the pioneerig work.
Of course, I'm sure their Disney Corp lawyers tell them: Don't, b/c it undermines our claim of "proprietary."
@ Brian. So, what I said then :-)
Brian, I basically consider this partially a result of your influence, so I'd just like to say thanks, well done, and congratulations.
Brian, thanks for the explanation on WPA/LI.
Speaking of robbing your intellectual property, how do you feel about Cold Hard Football Facts coming out with Expected Points and Win Probability in recent months?
Well, to be clear, ESPN is not robbing any intellectual property. No one owns the concepts of WP or EP.
What I do "own" is my implementation of these concepts. The processes, assumptions, techniques, and algorithms, etc. are my intellectual property. Someone else can reinvent their own wheel in their own way, and that's absolutely fine.
HOWEVER, I take extreme exception to the CHFF 'win probability metric' article I just read. It pretends to have invented the WP concept, going as far as pondering 'what should we call this new marvelous invention?' And it was written by Luis Deloureiro, a longtime reader and fan of this site who has emailed me many times. That's just way, way out of bounds. I am very disappointed, in you, Luis, because I know you know better.
Make your own WP model. Awesome--I applaud you. You don't even need to cite me or Hidden Game. *But don't pretend you just invented the whole idea.* That's disingenuous and misleading, to put it very charitably.
What do you guys think?
Has the article been updated since your post, Brian? I don't see the line you reference?
ESPN is clearly guilty of plagiarism here. After having multiple conversations for a year and not specifically mentioning you nor giving you credit, they should be ashamed of themselves! Dean Oliver should stick with basketball stats until he's been there long enough. Don't be oblivious to the fact that he has put his name on this metric and takes full credit for calling it his creation. Furthermore, no one over there is gonna tell us what is actually behind the number, not because of its secret formulas, but rather their fear of exposure of its fundamental flaws to regular sports fans.
That said, the biggest issue that I see, in addition to those already mentioned, is that QBR is purely a SUBJECTIVE stat. Unlike the current passer rating, no one reading this can exactly recreate it. This stat is not computed from other official stats normally tallied. You cannot use NFL gamebooks and compute it. You cannot even calculate it mid-game -- you have to wait over an hour after the game to get the final result from an unverifiable and untrusted source.
So why isn't it instantaneous? Because, a fall intern has to watch the whole game (probably in slo-mo) and make SUBJECTIVE determinations for each play involving a QB. Two seperate reviewers could have different takes on a controversial play. Worse, if there happens to be a full slate of games on a Sunday, they'll probably need 15 or 16 interns to watch each game live and the video replays that will certainly be a part of it. That means up to 16 different SUBJECTIVE opinions which would have some perceptible impact to this SUBJECTIVE stat.
No overall consistency makes this a BS stat.
Chris-Sorry. That was the EP article. The WP article is here.
My reference above was not meant to be a direct quote. It's intended to be a mocking reference to lines like:
"But, lately, we’ve been asking ourselves if there is a way to differentially score an offense’s production based on the game situation – and, as such, the goals of an offense in each unique situation."
Lately you've been asking yourself *if* there is a way? Really? If?
"Until we think of something better, we’ll call this the Win Probability Metric (WPM)."
Clever.
"We can’t make any promises, but a future metric teasing out and quantifying clutch play may be on the radar."
Gosh. I wonder if that's even possible? Whatever should they call it?
The bottom line is this: If you write that one day you wondered if something is possible when you full well know that it's been done before, and the way to do it has been freely and publicly laid out for you, that's dishonest.
And by claiming that you accomplished what you didn't even know was possible, you are clearly implying you invented the concept. You are deliberately deceiving the reader in order to elevate yourself in his eyes. Not cool.
I agree, I actually did a little Google work and found the article you probably meant, then I got where you were going w/ that.
100% agree with you Brian. We know you're the real deal.
Pat, I believe ESPN briefly credited Brian in their QBR article, although not nearly enough in my opinion.
Brian,
i don´t know US-Law, but here in Germany you´d have great chances to sue AND WIN against the guys from CHFF...
They are so arrogant (which would be ok if they were right all the time), it would be great if someonwe brings them down to earth...
Karl, Germany
On the other hand, some of us who've been doing football stats for much longer than a few years feel that Brian has never given full credit to the numerous other EP and WP football models that were published before his.
Certainly Brian has been able to present and popularize his models like no one else has, but it sometimes seems like he pretends to have invented the WP concept. So I guess what goes around comes around.
Anonymous: That's patently absurd. I go out of my way to give credit where credit is due. But don't take my word for it...
First, google "Hidden Game site:advancednflstats.com" and look how many times I have *explicitly* credited that book. That's right, 29. (There are only about a total of 500 posts here, half of which are weekly rankings or link roundups. That means about 1 out of every 10 substantive posts reference Hidden Game!)
Then google "Virgil Carter site:advancednflstats.com". Another 9 references.
Next, check out the permanent link to Football Commentary on the right.
How about the baseball pioneers who developed the WP concept? There's permanent link to Tango/MGL. There are many others, I'm sure, but none of my work is based on, inspired by, or derivative of any other model.
Lastly, I recently completed writing an entire chapter of a book (to be published next year) about the history of football modeling.
So, nice try.
Cold Hard Football Facts is a joke. They're like every other crap football site. Just look at this garbage.
"CHFF Insider, meanwhile, will instantly re-invent the way you analyze football games and pick winners and losers – with information you can not find anywhere else."
"Re-invent the way I analyze football"...don't think so.
"Information I can't find anywhere else"...hardly. In fact I can probably find it for free rather than shilling out $79 bucks.
I get it. They are out to make a buck if they can off the fans who don't know any better. Hell they even use a cheesy name like " cold hard football facts" They love to use psuedo math and pretend its rigorous and pretend they've come up with stats that no one has ever thought of.
I like going there for a laugh...
Brian, so Krasker's is the only WP model from the last 20 years you can cite? If you're writing about the history of football modeling and can't name at least 5 recent EP and WP NFL models each, you really haven't done much research.
Tango/MGL developed the WP concept? Now that's funny. You should pick up Alan Schwarz's book The Numbers Game: Baseball's Lifeline Fascination with Statistics. If you do half the historical research Schwarz did, your book might be worth reading.
"Some of us...."and he remains anonymous....what bull crap....troll
If he actually had done anything perhaps he would have given his name and a link. He also obviously never reads the site, b/c I intuitively knew that Brian constantly is pointing out the history of WP, etc., without Brian showing the Google results.
Brian
You're right. You're doing a terrible job of promoting the 'matchup page'.
I have to search for matchup and then click through.
Please please add a link.
Sorry about the matchup page. There's no stats yet for the season, and the WP stuff is only being run for testing in the pre-season. So there wouldn't be much to see anyway. But thanks for the reminder.
Brian, just to ease the conversation back to Leverage Index. I've just been having a look at some (admittedly simple) examples to see if there's any ideas I can suggest to help you with LI.
I've taken some simple games with easily defined rules to see if there's any LI trends that can be gleamed that could be applied to NFL (for instance, one such game is two people toss a coin, HT scores a point for the head, HH or TT no points, most points after 20 flips wins). What I've come to is that LI seems to be a function of current win probability and time remaining in the game, with the highest LI when the game is close to 50:50 and there's very little time left (obviously). I don't know how you've been looking at LI, but I thought this might help you with improving your existing LI.
Brian: Can you please address the kickoff rule change (30 to 35) and its impact on expected points and 4th down decisions?
hey - good to have you back
Brian, Unfortunately ESPN just didn't hire you to improve their product. I applaud them for trying to improve the rating system, but I don't trust ESPN to go about it in a moral and ethical manner.
Drive on Brian we appreciate all your hard, groundbreaking work.
Funny how some one can be so critical but leave themselves "Anonymous." What cowardice!
I've been reading Brian's work for 2 years. He's done particularly good work of citing sources and creating historical perspective concerning WP and EPA. I don't feel like he ever made the claim that he invented these metrics. He's consistently shown how much he owes to his predecessors.
Brian, Thank you for all your insights so far and all those to come!
I've been playing around with QB WPA/G, EPA/P, AYPA, and also ESPN's QBR in relation to the 2012 season. Here's a summary of findings from QBs who played in at least 14 games (with one week remaining in the 2012 regular season):
1) The four indices all have relatively normal distributions, no significant skewness or kurtosis.
2) The four indices are highly correlated with each other (.81-.92).
3) All four indices have modest/large correlations with winning: .48 for AYPA, .55 EPA/P, .63 WPA/G, and .64 QBR. (Of course the latter two indices explicitly incorporate winning probability and thus have artificially inflated correlations with winning, also QBR includes aspects of WPA, EPA, and AYPA and thus is disposed to do about as well as the three of them combined -- if I'm understanding things correctly).
4) Perhaps most interestingly, the four indices display quadratic relationships to winning such that teams with below average QBs win about as many games as teams with average QBs, whereas teams with excellent QBs win many more games than teams with average QBs.
5) QBR might provide unique information over and above the combination of WPA/G, EPA/P, and AYPA. In a linear regression predicting wins, R-Square increases from .40 to .50 when adding QBR to the other three (p=.07). Not sure if this is an artifact of QBR heavily incorporating information about winning probability, a result to dismiss due to small sample size, or a credit to QBR for doing something right.