Using Probabilistic Distributions to Quantify NFL Combine Performance

Casan Scott continues his guest series on evaluating NFL prospects through Principal Component Analysis. By day, Casan is a PhD candidate researching aquatic eco-toxicology at Baylor University.

Jadeveon Clowney is thought of as a “once-in-a-decade” or even “once-in-a-generation” pass rushing talent by many. Once the top rated high school talent in the country, Clowney has retained that distinction through 3 years in college football’s most dominant conference. Super-talents like Clowney have traditionally been gambled on in the NFL draft with little idea of what future production is actually statistically anticipated. For all of the concerns over his work ethic, dedication, and professionalism, Clowney’s athleticism and potential have never been called into question. But is his athleticism actually that rare? And is his talent worth gambling millions of dollars and the 1st overall pick on? This article aims to objectify exactly how rare Jadeveon Clowney’s athleticism is in a historical sense.

Jadeveon Clowney set the NFL draft world on fire at this year’s combine when he delivered one of the most talked-about combine performances of recent memory, primarily driven by his blistering 40 yard dash time of 4.53. Over the years, however, I recall players like Vernon Gholston, Mario Williams, and even Ziggy Ansah displaying mind-boggling athleticism in drills. But if each year a player displays unseen athleticism at the combine, who is really impressive enough that we deem them “Once-in-a-decade?”

Probability Ranking allows me to identify the probability of encountering an athlete’s measurable. For instance, I probability ranked NFL combine 40 yard dash times for 341 defensive ends from 1999-2014 (Table 1 shows the top 50). In this case, Jadeveon Clowney’s 40 time of 4.53 had a probability rank of 99.12, meaning his speed is in the 99th percentile of all DEs over this time span.

NFL Prospect Evaluation using Quantile Regression

Casan Scott continues his guest series on evaluating NFL prospects through Principal Component Analysis. By day, Casan is a PhD candidate researching aquatic eco-toxicology at Baylor University.

Extraordinary amounts of data go into evaluation an NFL prospect. The NFL combine, pro days, college statistics, game tape breakdown, and even personality tests can all play a role in predicting a player’s future in the NFL. Jadeveon Clowney is arguably the most discussed prospect in the 2014 NFL draft, not named Johnny Manziel. He is certainly an elite prospect and potentially the best in this year’s draft, but he doesn’t appear to be a “once-in-a-decade” type of physical specimen based exclusively on historical combine performances. From the research I’ve done, only Mario Williams and JJ Watt can make such a claim. Super-talents like Clowney have traditionally been gambled on in the NFL draft with little idea of what future production is actually statistically anticipated. All prospects have a “ceiling” and a “floor” which represent the maximum and lowest potential that a prospect could realize respectively. But what does this “potential” mean and does it hold any importance for actually predicting a prospect’s success in the NFL? In this article I will show how Quantile Regression, a technique used by quantitative ecologists, can clarify what Clowney’s proverbial “ceiling” and “floor” may be in the NFL.

Athletes are a collection of numerous measured and unmeasured descriptor variables. Figure 1 shows a single predictor (40 yard dash time) vs a prospects’ Career NFL sacks + tackles for loss (TFL) per game.

Podcast Episode 22 - Brian Burke

Brian Burke returns to the show to recap the 2014 NFL draft. He describes the Bayesian Draft Analysis tool he created and discusses the value of trades made by teams during the draft. Brian and Dave then discuss their favorite new addition to the league, John Urschel, and make a pitch to get him to contribute to the site. Brian also previews his new project, WOPR, and explains how it'll help generate data for some previously unanswerable questions.

This episode of Advanced Football Analytics is brought to you by Harry's. Harry's delivers high-quality shave products straight to your door at a fraction of the price of shaving competitors. Go to Harrys.com and use the offer code "AFA" at checkout to save $5 off your first purchase.

Subscribe on iTunes and Stitcher

The AFA Draft Pick of the Year

Was the next Virgil Carter drafted yesterday? Penn State Guard John Urschel was taken with a compensatory pick in the 5th round by Baltimore. John stands out because he has an unusual plan for his time after his playing days are over. He says he's very interested in  "sports analytics. Data analysis for football."

If he does, he'll analyze circles around the rest of us. While playing for PSU, John earned his degree in Math in just three years. Then added a masters degree in math, and is currently working on a second masters in math education. He's published research with names like Instabilities of the Sun-Jupiter-Asteroid Three Body Problem, A Space-Time Multigrid Method for the Numerical Valuation of Barrier Options, and Spectral Bisection of Graphs and Connectedness in which he proved the Urschel-Zikatanov Generalized Bisection Theorem. Man, I wish I had a theorem named after me.

To us, his most interesting research might be this article he wrote for ESPN The Magazine. He looked at "1) how best to predict a lineman's draft position, 2) that prospect's success in terms of NFL starts, and 3) whether a fringe prospect will be selected." Sounds like it would have made a good guest post here.

The Bayesian Draft Model estimated the most likely time Urshel would be taken was pick 167, not very far off from his actual selection at 175. The chance he would be available at 175 was 43% according to the numbers. So almost spot on. Interestingly, Urshel's own selection may have been the result of some sharp analytics. Baltimore is known to have "a proprietary formula—a “special sauce,” assistant GM Eric DeCosta calls it—that factors in potential compensatory picks to the free agency cost-benefit analysis."

Urschel would make a killer impact on the world of football analytics if he chose. However successful his pro career turns out, he'll carry the credibility of a pro-caliber player. Coaches will take what he has to say much more seriously than what an ex-Navy pilot writes on a website.

So, congratulations, John! I'll be rooting for you on the field and off. Play like a Raven!


Project WOPR is Coming

With the Bayesian draft tool completed, I can now focus on completing Project WOPR. For those who might be fans of mid-80s Matthew Broderick movies, you may have figured out what the WOPR is.

I'll give another clue:
It's purpose to answer the un-answerable questions of football strategy.

But for now, it's taking up my entire basement and has driven my electricity bill through the roof. The liquid-nitrogen cooled 32-core processors aren't cheap either.

Live Updates Tonight

I'll be updating the Bayesian draft model live tonight. I was triple-booked for this evening, and I thought I wouldn't be able to make it happen. But now I'm only double-booked, so in the immortal words of Bill O'Reilly, "F--- IT. WE'LL DO IT LIVE!"

As players are chosen, the probabilities will obviously start changing rapidly.  The fact a player is off the board and no one else could fill that slot is information (with absolute certainty) that can be fed back through the model. The effects will cascade through the rest of the available picks.

Unfortunately, the interface won't update automatically for users. You'll need to click refresh or hit F5 after each pick. There will be at least two or three minutes of lag for the updates to work through the system, so be patient.


New Feature on the Draft Model

In my last job I worked with a team of software developers. The interfaces they designed didn't make much sense to me. The interfaces were always, at heart, a giant expanding tree of classes, objects, and properties. Huh? Lots of tiny plus and minus marks everywhere to expand and contract the accordion. Left click to view something. Right click to modify it. If you ever had to deal with the Windows registry, it was like that. Steve Jobs would not have been thrilled.

When I learned a little about object oriented programming, it all made sense. The software engineers were designing the interface for their own convenience, not for ease of use. It made sense from an efficiency standpoint...a programming efficiency standpoint. But from the perspective of the user, it wasn't so efficient. The least used feature was just as accessible as the most common feature, and all of them were hidden until you expanded the right portion of the tree.

Yesterday I realized I was doing the same thing with the draft model. From my point of view, it's easiest to think in terms of players and their probability to be selected at each pick number, because that's how the software that runs the model works. It goes down the list of prospects, player-by-player, looking at the probability he'll be selected pick#-by-pick#.

For the players and their agents, and for fans of particular players, this is ideal. They want to know where and when they'll go. But the user is probably thinking of things from a team's perspective. Whether the user is a team personnel guy or a fan of a team, he'd rather see things from the perspective of a pick #. Right now, a Vikings fan (or exec) would have to click through over a dozen or so of the top players to see who's likely to be available to them at pick #8. And if they were wondering about who'd be available if they trade up or down, that's another few dozen clicks. Scroll, click. Scroll, click...

Podcast Episode 21 - Cade Massey

Cade Massey, Professor of the Practice at the Wharton School of Business, joins the show to discuss his research on the NFL draft. Professor Massey is the co-author of "The Loser's Curse: Decision Making & Market Efficiency in the National Football League Draft", a paper analyzing the market for draft pick trades. He and his co-author, Richard Thaler, discovered that teams picking at the top of the draft actually sacrifice a great deal of what he calls "surplus value" by not trading down for additional selections.

Dave and Cade look at the reasons why teams employ less than optimal strategies, including risk aversion, adherence to norms established by "The Chart" and other psychological factors. Professor Massey defends his paper against critiques, and discusses why he believes the draft is such a compelling spectator event.

Subscribe on iTunes and Stitcher