One of the reasons I have always supported and endorsed Brian Burke and his work at Advanced NFL Stats is his recognition of the limitations of statistical analysis. Ever since statistical analysis took baseball by storm and many of its most prominent practitioners were scooped up and employed by MLB franchises, there has been a crush to translate statistical analysis to other sports. And to varying extents, it has worked.
Football remains relatively impenetrable. It doesn’t have the binary pitcher-batter interface of baseball. Apart from maybe touchbacks, there are no true individual stats. One player’s name may appear beside a 20 yard reception but that reception is the product of: one player passing, one player receiving, x number of players running (ultimately) decoy routes, and y number of players blocking.
All those confounding factors tend to test the intuitiveness of advanced stats. Yes, Austin Collie was the second most valuable receiver by EPA/P, but, no, no one thinks Collie is the second most valuable receiver in football. Even aggregate EPA produces some head-scratchers. Was, for instance, Lance Moore truly more valuable than Marques Colston? Probably not, right? But we are short of information to explain exactly why. Keep in mind, a stat like EPA isn’t arguing that Moore is more valuable as a football players than Colston, only that passes targeting Moore were more valuable in toto than passes targeting Colston.
For the time being, advanced stats really must be combined with scouting to create a meaningful whole. However true that may be though and however long that may be true, that doesn’t mean stats are maxed out. There remains huge potential but some of that potential will never be realized unless the NFL itself improves its own stat keeping.
That is the subject of this piece.
Improve accuracy: According to the NFL,
“Gamebooks are prepared on site at each game using data available immediately following the completion of a game. They are intended to provide a snapshot of the game's action and are not updated after stats are made official on Monday mornings. Please note that scoring decisions made on game days are reviewed and frequently adjusted before becoming official.”Seems like a reasonable standard. Football is messy, gamebooks are imprecise, but the NFL does its due diligence to correct what errors may have occurred during the initial stat keeping.
Though if you have ever compared game film to play-by-play, you will quickly recognize that football is messy and football statistics are often messier still. What determines who is awarded a sack or an assist is often all but impossible to determine. Players not even on the field are awarded tackles. What exactly defines a “pass defensed” is anyone’s guess.
Before we can add information, stats, the NFL must do a better job of recording the stats it already has. That means: clear definitions and accurate accounting.
Personnel data: Part of the Colston-Moore dilemma is that though Moore is very much involved in Sean Payton’s offense, Moore is a reserve/slot receiver, while Colston is the vaguely-defined “number one receiver.” Moore had one start to Colston’s 11. But “start”—which literally means that player was involved in his team’s first offensive/defensive play—is a shallow tending towards ludicrous way to define a player’s importance to his team. Maybe Colston started but Moore played in more snaps.
An NFL game typically involves a hundred or so plays. It is not a particularly long or involved process to record the personnel for one game. It is, however, a bit daunting to do that for the 512 games that populate the NFL regular season—at least for one person. It should not be hard for the NFL. Some personnel groupings are reasonably static like the offensive line. Others are a little more dynamic like wide receivers or defensive linemen.
Personnel data would illuminate both usage and create a kind of plus-minus. First used in hockey, plus-minus has been exported to basketball too. In football, it would help account for receivers that draw double teams and create openings for other receivers; defensive linemen that draw double teams and create space and openings for other linemen; running backs that pass block effectively; etc. I have seen simple attempts at this application. As a Seahawks fan, I knew that Seattle was a much less effective run defense when Marcus Tubbs was inactive. But inactive or active, starting or as a reserve, are at best approximations for on the field or not. And if we want to know whether Tubbs really was essential to the run defense, rather than if his absence and the corresponding decline in run defense was just a coincidence, we need accurate and comprehensive accounting for participation by all players on all snaps.
Play action and draws: In its most cardinal form, its most absolute form, football is a game of running or passing and defending the run or defending the pass. But it’s not a neat binary. It is more like a scale or range, extending from a “pure” pass play like a five wide receiver, shotgun set, to a pure run play, something like the modern “wildcat” or wing t.
Statistical analysis has proven that passes are on the whole more effective than runs and thus seemingly underutilized, but that conclusion stems from the initial assumption of run or pass, which is an oversimplification. Personnel data would help bridge that gap by indicating whether, say, a vertical threat like DeSean Jackson truly forces safeties back and thus improves the run game. However, to really understand how the run and pass game interact, we need to account for play fakes. Play fake are elemental and essential parts of football strategy. Is, for instance, the value of Adrian Peterson—a talent adored by coaches and fans but largely undervalued by advanced statistics—hidden in his ability to improve the Vikings play-action offense? Or, conversely, how much does a great quarterback and a great passing offense improve the value of a draw play? We don’t know and until we do know, statistical analysis will be stuck attempting to evaluate an absolute run and an absolute pass in a game that's really about everything that falls between.
Subdivide yards into feet: This might be a pipe dream but it’s too important to exclude: football is measured in three foot increments. When a run off right end goes for two feet, it’s recorded as a one yard gain. When a run off right end goes for four feet, it’s recorded as a one yard gain. On first and ten or third and ten, etc, anything short of the first down, be it 1 inch or four feet, is recorded as a nine yard gain. On first and inches, anything that converts the first down but does not meet the threshold for a two yard gain, be it the “inches” in question, or four feet, is recorded as a one yard gain. In isolation, some of this doesn’t matter much, but in aggregate, it makes a mess. For instance, how do we determine what a successful play is when every gain from 3.5 to 4.49 yards is lumped together?
Measuring the gridiron in yards is a holdover from Rugby Union, plus some tinkering by Walter Camp. For obvious reasons, primarily the lack of downs and thus first downs, increased granularity in measuring a rugby field is unnecessary. American Football is one of the few places the unit "yard" is still in wide usage. It’s essentially a matter of tradition, and one I doubt the NFL is keen to change. That said, yards do not have to be discarded entirely, just made more specific. There’s nothing particularly hard to decipher or unorthodox about measuring play length in yards and feet. It would certainly make statistical analysis, now dependent on a wealth of imprecise measurements, more accurate and meaningful.
That is the greater cause. Football may not need a statistical revolution. The NFL is the most successful professional sports league in America. It may not need enthusiasts that dig deep into the data and determine just how the game works, but if we’re out there, and we’re dedicated, and we represent part of the league’s future fanbase, why stubbornly ignore us? Fans, fantasy football fans, gamblers, enthusiasts, are hungry for new information, new ways to appreciate, understand and love the sport of football. Probably not a priority right now in these dark days of the lockout, but why not NFL? Why not take the best sports product on Earth and make it better.
As a European, "feet" is still a very inaccurate (and weird) measurement to me.
But yeah, I couldn't agree more with the post. Of course, even with all changes implemented the statistician's job becomes easier but still imperfect for the sport. Limitations are limitations no matter how you look at it.
I doubt the NFL has much interest in improving the available statistics, sadly. Which in a way is "amusing" because of the prevalence of statistical triggers in player's contracts. You'd think with that habit they'd like some more meaningful measurements.
Speaking as someone familiar with plus/minus, both in hockey and basketball, it will take a minor to medium-sized miracle for the technique to work in football. The lack of variability in the players on the field on any given play will make it impossible to tell anyone apart. Although I would be pleasantly surprised if it worked.
A plus/minus in terms of scoring may not be useful, but plus/minus in terms of yardage (or whatever measurement we end up with) could be informative.
At what point do they just put chips in each player's helmet, kind of a low-res motion capture? Then the data could be played back and we'd always know exactly where each player is on the field.
The ball should have a chip too that understands orientation - it'd know when it crossed the goal-line, how many orientations it has in a throw, etc.
Football still insists on using volume stats (e.g. total yards, total points scored) instead of averages.
I'd be thrilled to see announcers and analysts start focusing of measures like QBs YPA and a RB YPC instead of 300 yrd and 100 yrd games.
and yes, would love to improvement in game logs as well.
I'm so glad you mentioned scouting. Even if the resolution of distance improved dramatically, there's still so much information that is lost in the way stats are recorded. More and more people are applying more and more advanced analytics to numbers that, as you point out, are pretty-rough-and-ready to begin with. Worse, others are treating the precise-to-the-Nth-decimal results as unassailably "objective," even though a whole bunch of subjective human judgment goes into both the creation and analysis.
Peace
Ty
A good number of wise people in different fields have noted that statistical/quantitive analysis better informs judgement, but does not replace it (and should not). We still come down to having to reach opinions. The better the objective, empirical information we base them on, the better they should be -- but we still come down to that.
Moreover, NFL football is forever cursed for quantitative analysis purposes by the problem of small sample size. Much more so than any other major sport. There is just no way around that. (*If only* NFL teams played 162 games a year and had an historical data base of 300,000 games played with only minor rules changes!)
The "plus minus" idea is an example of that. I'm not an expert on plus-minus or the NBA, but have seen discussions among those who are indicating that while *in theory* it is a great way to evaluate NBA players, in practice to be applied with confidence it needs a sample size of maybe about 200 games, when there are "only" 82 in season. So to apply it during a single season inevitably requires the proverbial finger lightly tapping the scale to move the result in the "more credible" direction.
Be that as it may, and I'm no expert on that dispute, that's for NBA basketball which is a far simpler game than football, with only 10 players on the court and a big, fat 82-game season. I don't see how plus-minus could ever be applied in the NFL except in very limited, specialized situations. Way too small sample size. And alas it's the same for many other potentially good analytical ideas.
If one *could* apply +/- to football, I think I would like 1st downs to be the measurement of choice. Sure, with small sample size, certain players could be "overrated" or "underrated", just like any stat. But the reason I say first downs is this: If you're the 3rd down slot receiver, your job is to help the team get that 1st down. If you (or a teammate) gets that 11 yd. catch on 3rd & 10, then that's good. If you got 9+, but ultimately punted, then that's bad. Sure, there will be a couple of 3rd & 10 from the opponent's 40 that become 4th & 1 from the "31" that leads to a FG, which should be counted as a partial success--the ball was moved into FG range vs. punting from the 40.
By the same token, shouldn't a 50+ yd TD be counted extra because there could have been multiple opportunities for the drive to fall short of points? These and other similar conundrums lead me to believe that plus-minus would be difficult/impossible to implement--for one, all scoring plays do not have the same value.
On the other hand, I do agree with the spirit of John's post: football stats could be better. And although statisticians have their "favorite" stats (EPA/WPA, DVOA/DYAR, etc.), my opinion is that any team/owner willing to outlay some cash to hire 2 or 3 guys to just go over film [esp. all "offseason"] and look at things discussed in sites like this one would benefit in a major way.
Great suggestions, great examples, and surprising depth. Great work.
Correction: There are not 512 games in the NFL regular season, there are 256.
Isn't EPA essentially a form of +/- for yards?
Well, one thing I would like to move away from is the whole "Yards Per" idea (average yards per play) and instead embrace median yards per play. Think about it this way, what is more valuable to a team: a "home run" style running back who averages 4-yards per carry, but barely gets more than a yard on a single play; or a running back who's guaranteed to get 4 yards every play?
I think we just all default to the idea of averages because they are easy to wrap our minds around, but I find it hard to think of football as a game of averages.
I don't think the NFL would go with medians. In terms of run plays over the course of the year that could be a useful number, but when a rusher only has 9 rushes in a game, the median yards for that game wouldn't be meaningful. When it comes to passing, if a QB completes less than half his passes, the median would be 0. Also it would make a lot of running backs look equal, since everyone will be within a few yards of the same, and so many people tied since there wouldn't be decimals.
One of the biggest issues with quantitative analysis in football, basketball, and other major sports that are not baseball is the fact that the increase in certain statistics for one player can often negatively correlate with the same stats for his teammates. It's easy to see in basketball - only one player can grab a rebound after a missed field goal; only one player can score on a possession. In football, only one player can bring the ball into the endzone. Every rebound that your teammate grabs is a rebound that you cannot get. Rate statistics and efficiency statistics can remedy only some and sometimes none of the problem.
Baseball is very different. Your teammates can't "steal" homeruns from you. Even if we could measure the true contributions of a basketball or football player in a given game or season, we cannot measure what their contributions may have been with different supporting casts. Stats like adjusted +/- try to get at this, but are far from perfect.
Also, I think it's important to refrain from assessing the validity of statistical analysis as a whole. Yes, it's true that quantiative analysis in its present state has limitations and should be used in tandem with old-school methods like scouting. But this does not mean all stats are on the same level. From some numbers, we can dervie a great deal of information - "scouting" may be almost unnecessary. In other cases, the numbers don't even begin to tell half of the story. Each metric needs to be assed on its own scale.
and by assed i mean assesed.
Fourth and Fourty Two, the way stats are currently constructed yards gained and lost are only whole yards. As the vast majority of running plays result in 3 yard gains, every single runningback with enough carries to matter would have a 3 yard median, making that stat totally useless. If we could break yards down to feet, tenths of a yard, etc, that would be more useful, but as it stands now median would not be any more helpful.
2000-2010 rushing stats (runs by non-QBs in 1st and 3rd quarters occuring between 10 and 80 yards from own goal line):
median: 3 yards
mode: 2 yards (13.4% of all runs)
average: 4.43 ypc
The distribution is positively skewed, which is why the median and average are higher than the mode.
All the measures of central tendency are biased in some way when attempting to measure rushing productivity. Average tends to favor high-variance rushers while median tends to favor low-variance rushers. Both types of players can be very good or very bad, which is why I think it is important to look at a variety of metrics and not fall in love with one. But if you're more interested in median yardage, you can get a good idea of how a certain player or team performs in that category by looking at success rate.
I'm not exactly sure where this should be posted, so I just opted for the most recent entry.
I was wondering what your thoughts are on the kickoff being moved up to the 35. Specifically, how much more valuable will it make a surprise onside kick?
Great post.
If I had to choose one change, I think I would choose more accurate accounting of who was on the field (although the conversion of yards to feet is very tempting). I'm surprised, in fact, that it's not the norm, as it's not an extraordinarily hard task. All numbers are registered, and to make things easier, they even put names on the jerseys. Most production shots do not capture all of the jersey names and numbers, but the coaching film that the NFL has exclusive rights to almost certainly does.
One thing that could add value is football computer simulation. An extremely crude version might be an internet game like http://goallineblitz.com/
If you actually tried to build a more serious model you could illuminate a lot of strategies and get an idea of how a player can fit into a system. You could use a mix of statistics and scouting to try and model a players behavior and then run the simulation over and over.
So I suppose median concentrating around 3 yards or so relates to how runs typically formed, being either stuffed or stopped by the defensive line or running back. If I had to make wild speculation, median would favor the running backs that have tendencies to "fall forward," who can stumble an extra yard after solid contact.
And perhaps my fear of averages is being burned so many years in a row by fantasy football, the chance that any particular run play may result in significant yardage.