I'm preparing to move the site to a new platform. Within a day or so the site will be hosted on a new content management system, which will allow us much more flexibility with all the new tools and content coming this season. I'm not sure how smooth the transition will be, so please stay tuned via Twitter in case things get broken over the next couple days.
With the url (web address) change earlier this year and now this change, I've completely destroyed the site's search rankings (which were really good), links, and 7 years of goodwill. I apologize.
To fix things, I'll need your help. Please update your links to the site's url and, more importantly, spread the word. Hopefully you've enjoyed reading the posts or using the other features at AFA over the past several years. I haven't asked much in return, so please give the site a shout out on Twitter, your own site, or whatever platform you have.
Just to be perfectly clear, here are how things are changing:
1. The new site will offer all the same content you've come to expect, including analysis, tools, live WP graphs, advanced stats, the podcast, advanced boxscores, and visualizations. In addition, it will have a separate section for team clients and media outlets who use AFA services.
2. The web address will continue to be www.advancedFootballAnalytics.com.
3. The twitter handle will remain @Adv_NFL_Stats.
4. The site's feed will change to http://www.advancedfootballanalytics.com/index.php?format=feed&type=rss (or ...&type=atom, depending on your preference.)
5. The podcast feed will continue to be http://advancednflstats.libsyn.com/rss
6. For now, older content will continue to be hosted at archive.advancedFootballAnalytics.com.
For the month of October I'll be celebrating the re-launch of the site with a feature of the day, highlighting many of the most popular things the site has to offer.
- Home Archives for 2014
Big Changes Coming
Biggest Plays of the Week
Reading the comments, people seem to take delight in the negative plays for whatever reason. Cutler's interceptions appear to invite a lot of anti-fans. There's also loyal fans who like to pile on the misfortunes of their own team out of frustration. I can see that this week with E.J. Manuel's bad day yesterday.
But keep in mind a "negative" play is all in the perspective. A bad play to an offense is a great play to a defense. Statistically, there is no good and bad. Manuel's interception to Watt was both a poor by play by a quarterback and an alert and athletic play by a defensive lineman.
If and when Regressing gets tired of posting these, I'll make the tool public so you can look up the most agonizing moments in your hometown team's season anytime you want.
Sunday's Numbers Have Been Crunched
Team Advanced Stats Viz
Position Leaders Viz
Advanced stat box scores
Top QBs of the week
Top RBs of the week
Top WRs of the week
Top TEs of the week
Top Defenders of the week
Advanced team stats
Offensive player season leaders
Defender season leaders
QB Viz
RB Viz
Chip's Challenging Decisions
One Thing I Learned from the WOPR
The WOPR is my game simulation engine. I've had a ton of fun experimenting with different things, finding out when teams should make various tactical decisions that might be uncommon or hard to isolate empirically (directly from the data). But one of the more profound things I learned from the WOPR relates to game outcomes between completely even teams.
Weekly Game Probabilities - Week 4
Please remember that the projected scores are not to be taken terribly seriously. Do not bet the mortgage on them as they are not intended to graded against the spread. They are simply a "maximum-plausibility" estimate given respective team scoring tendencies.
Team Efficiency Rankings: Week 3
This obviously holds big implications for the rankings, which are based on the generic win probability (GWP) of a particular team against an average team. But more importantly, both are predictive models which emphasize factors that best suggest how likely a team is to fare in the future. When one improves, the other should theoretically improve alongside it.
We'll keep an eye on that hypothesis as the season moves along. Perhaps I'll write something at the end of the season comparing the accuracy of this year's model to 2013's. For now, let's take a look at some of the most notable trends from the first rankings of 2014 (click here for a full explanation of the rankings methodology).
Leaving Free WP All Over the Field
So why do NFL coaches voluntarily leave WP out on the field?
Take yesterday's DEN-SEA game as an example. SEA was ahead 17-12 in the 4th quarter, and had the ball deep in their own territory with about 9 minutes to play. With the game clock running, they snapped the ball with: 8, 5, 5, 8, and 10 seconds left on the play clock. That's a total of 36 seconds. Plus, there was a play in which the receiver could have just as easily remained in bounds. Because there was more than 5 minutes left in the game and the clock restarts after the ball is set, that may have only cost 10-15 seconds of play clock rather than up to 40 seconds. To be fair let's say there was a total of 46 seconds SEA could have burned off the clock during their second to last drive with almost no effort or risk.
Sunday's Numbers Have Been Crunched
Team Advanced Stats Viz
Position Leaders Viz
Advanced stat box scores
Top QBs of the week
Top RBs of the week
Top WRs of the week
Top TEs of the week
Top Defenders of the week
Advanced team stats
Offensive player season leaders
Defender season leaders
QB Viz
RB Viz
Texans Try Once, Fail Twice
Down 14-0 at the start of the second half to the New York Giants,
the Houston Texans faced a 4th-and-1 on their own 46-yard line. At this point, with just a 9.0% chance to win, Bill O'Brien made the correct call to go for it. A successful conversion means a 12.9% win probability, while a punt means about an 8.6% chance to win. The break-even point going for it is far below an estimated 65% conversion rate on 4th-and-1. Alfred Blue ran off right tackle and was stuffed, turning the ball over on downs. The Giants would kick a field goal to go up 17-0.
On the very next drive, Ryan Fitzpatrick led the Texans downfield to the Giants 9-yard line where they faced another 4th-and-1. With 6:13 left in the third, down 17, Bill O'Brien elected to take the chip-shot field goal. Even the commentators suggested he should be going for it. Obviously, the prior failure on fourth down should not have an affect on the Texans' decision this time. If that were the case, O'Brien would be judging his previous decision on the outcome, rather than the process. The only other logic could be that he figured they would need a field goal at some point, down 17 - common faulty logic in the NFL as coaches should be doing whatever they can to maximize their chances of winning.
Eagles-Colts Call on MNF: Not Actually That Important
By Kurt Bullard
Kurt is a sophomore at Harvard and a second-year member of the Harvard Sports Analysis Collective. He intends to major in either Economics or Statistics. Go 'Cuse.
Football fans – and sports fans in general – abhor the fact that mistakes made by the referees at the end of games can influence the result of the contest. Nowhere was this seemingly more apparent than in this week’s Monday Night Football game. Indianapolis seemed to have the game all but wrapped up towards the end of the fourth quarter. With a 27-20 lead and the ball at the Eagles 22 with 5:15 remaining, the Colts seemed poised to score – either by capping off the drive with a touchdown or settling for a field goal behind the reliable leg of Adam Vinatieri. However, on a 3rd and 9 call, Luck dropped back and targeted T.Y. Hilton, but was intercepted by Malcolm Jenkins, injecting the Eagles with what seemed to be a second life. The Eagles managed to score quickly to knot up the game behind the legs of Darren Sproles and would go on to win the game in regulation off the leg of Cody Parkey. However, this was all possible due to a missed pass interference call on Brandon Boykin, who held Hilton coming out of his break, allowing the ball to sail past Hilton and into the hands of Jenkins.
Weekly Game Probabilities
Please remember that the projected scores are not to be taken terribly seriously. Do not bet the mortgage on them as they are not intended to graded against the spread. They are simply a "maximum-plausibility" estimate given respective team scoring tendencies.
Podcast Episode 29 - Brian Burke
Brian Burke makes his first regular season appearance on the podcast to recap week two and discuss his latest research. Brian explains his break-even models for two point conversions and challenges and describes how the WOPR allows him to create test data for all sorts of interesting hypothetical game strategies. He also discusses an upcoming post that examines when teams should start running their "four minute offense". Dave and Brian close out the episode with an update on the 4th down bot and the new home and format for Brian's weekly game predictions.
This episode is sponsored by DraftKings, the leading provider of daily fantasy sports. If you use this link or promo code "AFA" to create a new account and make a deposit you'll gain a free $2 entry into this weekend's $100,000 Play Action Tournament.
Subscribe on iTunes and Stitcher
Two-Point Conversion in the KC-DEN game
NFL coaches typically adhere to what's known as the Vermeil Chart for making two-point decisions. The chart was created by Dick Vermeil when he was offensive coordinator for UCLA over 40 years ago. It's a very simple chart that simply looks at score difference prior to any conversion attempt and does not consider time remaining, with one caveat. It applies only when the coach expects there to be three or fewer (meaningful) possessions left in the game.
With just over 7 minutes to play, there could be three possessions at most left, especially considering that at least one of those possessions would need to be a KC scoring drive for any of this to matter. (In actuality, there were only two possessions left, one for each team.) Even the tried-and-true Vermeil chart says go for two when trailing by 5. But it's not the 1970s any more and this isn't college ball, so let's apply the numbers and create a better way of analyzing go-for-two decisions.
Except for rare exceptions I've resisted analyzing two-point conversion decisions with the Win Probability model because, as will become apparent, the analysis is particularly susceptible to noise. Now that we've got the new model, noise is extremely low, and I'm confident the model is more than up to the task.
First, let's walk through the possibilities for KC intuitively. If KC fails to score again or DEN gets a TD, none of this matters. Otherwise:
Chiefs Crawling Drive, Come Away With Nothing
The Chiefs lost to the Broncos 24-17 on Sunday and had a chance to at least tie the game at the very end. Kansas City kept Peyton Manning off the field for an enormous chunk of the second half. The Broncos offense had only two drives after halftime (not including the final kneel down), one for a punt, one for a field goal, totaling just 8:51 in possession. The longest drive came from the Chiefs at the very start of the second half, where they ran 23 plays, taking 10 minutes off the clock... and ultimately missed a field goal. This got me thinking, how does drive length (in minutes) affect the probability of a team scoring?
First, here's a look at the ridiculous drive using our Markov model:
SOE: Weekly Game Probabilities
Nick Foles and Interception Index Regression
With one week of the 2014 season in the books, Foles and McCown have already matched that combined total. While everyone should have expected both to regress from their remarkably turnover-free 2013 seasons, that does not tell us how far each should regress based on historical norms.
Podcast Episode 28 - Chase Stuart
Chase Stuart rejoins the show to break down the surprising week one winners and losers. Chase shares his observations from the Jets home opener and explains some of the weekend's more intriguing game scripts. He also examines the data on exactly how important week one results are in predicting the season while looking ahead to the most intriguing week two match-ups.
Subscribe on iTunes and Stitcher
Simulating the Saints-Falcons Endgame
I previously examined intentional touchdown scenarios, but only considered situations when the offense was within 3 points. In this case NO needed a TD, which--needless to say--makes a big difference. Yet, because NO was on the 1, perhaps the go-ahead score was so likely that ATL would be better off down 3 with the ball than up 4 backed-up against their goal line.
This is a really, really hard analysis. There's a lot of what-ifs: What if NO scores on 1st down anyway? What if they don't score on 1st but on 2nd down? On 3rd down? On 4th down? Or what if they throw the ball? What if they stop the clock somehow, or commit a penalty? How likely is a turnover on each successive down? You can see that the situation quickly becomes an almost intractable problem without excessive assumptions.
That's where the WOPR comes in. The WOPR is the new game simulation model created this past off-season, designed and calibrated specifically for in-game analytics. It simulates a game from any starting point, play by play, yard by yard, and second by second. Play outcomes are randomly drawn from empirical distributions of actual plays that occurred in similar circumstances.
If you're not familiar with how simulation models work, you're probably wondering So what? Dude, I can put my Madden on auto-play and do the same thing. Who cares who wins a dumb make-believe game?
Analyzing Replay Challenges
Most challenges are now replay assistant challenges--the automatic reviews for all scores and turnovers, plus particular plays inside two minutes of each half. Still, there are plenty of opportunities for coaches to challenge a call each week.
The cost of a challenge is two-fold. First, the coach (probably) loses one of his two challenges for the game. (He can recover one if he wins both challenges in a game.) Second, an unsuccessful challenge results in a charged timeout. The value of the first cost would be very hard to estimate, but thankfully the event that a coach runs out of challenges AND needs to use a third is exceptionally rare. I can't find even a single example since the automatic replay rules went into effect.
So I'm going to set that consideration aside for now. In the future, I may try to put a value on it, particularly if a coach had already used one challenge. But even then it would be very small and would diminish to zero as the game progresses toward its final 2 minutes. In any case, all the coaches challenges from this week were first challenges, and none represented the final team timeout, so we're in safe waters for now.
Every replay situation is unique. We can't quantify the probability that a particular play will be overturned statistically, but we can determine the breakeven probability of success for a challenge to be worthwhile for any situation. If a coach believes the chance of overturning the call is above the breakeven level, he should challenge. Below the breakeven level, he should hold onto his red flag.
Eagles Escape Embarrassment
Let my bias not be unknown, I am an Eagles fan. Watching Nick Foles fumble twice, throw an interception, and Chad Henne connect with rookie Allen Hurns twice for touchdowns -- all in the first half -- was one of the more frustrating ways to start the season. The Eagles were lucky to only be down 17-0 at halftime. On the opening drive of the second half, Philly converted twice and Nick Foles connected with Darren Sproles for eight yards on 3rd-and-9, bringing up a 4th-and-1 at the Jaguars 49-yard line.
Chip Kelly is known for his progressive thinking and he didn't hesitate -- calling for a hurry-up, one of the first times the Eagles really played up-tempo in the game. The Jaguars safeties got crossed and a huge gap opened up as Darren Sproles ran untouched for a 49-yard score. The Eagles would not look back and ultimately went on to win "handily," 34-17. Had the Eagles not converted, though, Kelly would likely have been ridiculed for his call as it could have effectively ended the game (dropping Philly's win probability to 5.4%).
Let's look at the fourth down call, taking into consideration the relative strength of the two teams.
Weekly Game Probabilities: A New Home
For now the score predictions are simply maximum-plausibility estimates. (Yes, I just made that term up.) Predicting an actual score for each game is statistically boring. With few exceptions, a statistically sound estimate would be 24-20 or 27-21 for every game, so I've added some of the human element to the score predictions. The bottom line is that readers should focus on the probabilities and don't bet the mortgage on the scores.
The game probabilities will be matched up against the picks of Will Lietch, one of the cornerstone writers at SOE. The idea is to create a friendly competition between man and machine.
The game probabilities had a great run at the New York Times--5 years. But there are only so many thought-provoking or counter-intuitive lessons on probabilities and predictions that can be squeezed out of a week of NFL games. But AFA will continue working with the Times on various projects as the season unfolds.
Here's the link to the probabilities for week one. For those keeping score at home, I had the Seahawks at 66% to win last night.
The 4th Down Bot Returns
As I mentioned last year, although the 4th down issue is growing mold with smarter fans, it remains the lowest hanging fruit on the football analytics tree. So it's nice to be able to automate things and not have to do the analysis myself. But on the other hand, we can add 'football analyst' to the list of jobs being taken over by robots.
The Bot will be faster, more accurate, and come with some new features this season. Here is a brief introduction. Here's are a few notes on how it works. And here is his Twitter feed.
Season Projections Visualization
The method used to create the projections is explained here.
The viz is intended one-stop shopping for the season outlook. The top window shows the probabilities each team will make the playoffs. Dark green indicates a playoff berth by winning the division, lighter green indicates a wildcard berth.
The three windows below are team-specific. Hover the cursor over (or tap) a team's column in the top chart to see its details below. The window on the left is a chart of win totals. The bars represent the probability the selected team will finish with a corresponding number of wins. The second window shows the same information presented in a different way. It's the cumulative probability of each win total. In other words, it's the probability the selected team will win at least that many games. The third window is a pie chart. (Yes, I know pie charts are the unloved orphans of the chart world.) It illustrates the probability each team will win its division.
2014 Season Predictions for ESPN the Magazine
Then it got worse. "We want you to predict every score of every game."
I started doing some math in my head. There's 267 games in the season, including the playoffs, which means there's 2^267 different possible combinations of game outcomes in the season. While that might sound like a lot of different possibilities, it's even more than a human being could possibly fathom. Physicists and astronomers estimate there are about 10^80 atoms in the universe (that's 100 quinvigintillion to you and me). And the NFL season's 2^267 possible outcomes comes to 2.4x10^80, or about 240 quinvigintillion. Put simply, there more than twice as many possible outcomes to the NFL season than there are atoms in the universe. And that just refers to wins and losses, and doesn't even consider scores.
So how hard could it be?
Podcast Episode 27 - Mike Sando
It's an episode full of pivot tables and preseason predictions as Mike Sando, ESPN Insider, joins the show to discuss coaching tiers, predict breakout candidates and preview the upcoming NFL season. Mike shares his process for ranking and evaluating players and coaches, makes some predictions for the upcoming season and reveals his secret analytical weapon - a massive excel spreadsheet he uses to track every conceivable type of player data.
Subscribe on iTunes and Stitcher
Sneak Peek at WP 2.0
As a quick refresher the WP model tells us the chance that a team will win a game in progress as a function of the game state--score, time, down, distance...etc. Although it's certainly interesting to have a good idea of how likely your favorite team is to win, the model's usefulness goes far beyond that.
WP is the ultimate measure of utility in football. As Herm once reminded us all, You play to win the game! Hello!, and WP measures how close or far you are from that single-minded goal. Its elegance lies in its perfectly linear proportions. Having a 40% chance at winning is exactly twice as good as having a 20% chance at winning, and an 80% chance is twice as good as 40%. You get the idea.
That feature allows analysts to use the model as a decision support tool. Simply put, any decision can be assessed on the following basis: Do the thing that gives you the best chance of winning. That's hardly controversial. The tough part is figuring out what the relevant chances of winning are for the decision-maker's various options, and that's what the WP model does. Thankfully, once the model is created, only fifth grade arithmetic is required for some very practical applications of interest to team decision-makers and to fans alike.
Podcast Episode 26 - Keith Goldner
Keith Goldner, AFA contributor and Chief Analyst at Numberfire, returns to provide some fantasy football wisdom. He dissects the different strategies for snake, auction and keeper league drafts, and explains how to incorporate risk management strategies when creating your roster. Keith and Dave also touch on daily fantasy games, and look at how to craft the optimal lineups for various league types. They close out the episode by highlighting the players to watch during the 2014 fantasy season.
Subscribe on iTunes and Stitcher
Podcast Episode 25 - Aaron Schatz
Aaron Schatz, founder of Football Outsiders, joins the podcast to discuss the newly released 2014 Football Outsiders Almanac. Aaron explains how he and his team developed a system to combine game film analysis with box score data to create their own advanced metrics. He breaks down the difference between rate statistics and totals, and explains how the concept of “replacement level” is important in football. Aaron also provides team-by-team breakdowns and predictions, and finishes up the episode by conquering Dave’s “lightning round” of questions.
Subscribe on iTunes and Stitcher
Implications of a 33-Yard XP
Over the past five seasons, attempts from that distance are successful 91.5% of the time. That should put a bit of excitement and drama into XPs, especially late in close games, which is what the NFL wants. But it might also have another effect on the game.
Currently, two-point conversions are successful at just about half that rate, somewhere north of 45%. The actual rate is somewhat nebulous, because of how fakes and aborted kick attempts into two-point attempts are counted.
It's likely the NFL chose the 15-yd line for a reason. The success rates for kicks from that distance are approximately twice the success rate for a 2-point attempt, making the entire extra point process "risk-neutral." In other words, going for two gives teams have half the chance at twice the points.
Podcast Episode 24 - Brian Burke
Brian Burke returns to recap his busy summer offseason. After a brief lesson on the rules of Gaelic Football, Dave and Brian discuss what we can learn about NFL win shares from Jimmy Graham’s contract, some new updates to the site (WOPR, and Win Probability Model) and the 2014 season predictions Brian made for ESPN the magazine. Dave also issues a call for podcast contributors, looking for anyone interested in contributing their technical expertise to the show.
Subscribe on iTunes and Stitcher
Win Values for the NFL
In 2013 the combined 32 NFL teams chased 256 regular season wins and spent $3.92 billion on player salary along the way. In simple terms, that would make the value of a win about $15 million. Unfortunately, things aren't so simple. To estimate the true relationship between salary and winning, we need to focus on wins above replacement.
Think of replacement level as the "intercept" term or constant in a regression. As a simple example think of the relationship between Celsius and Fahrenheit. There is a perfectly linear relationship between the two scales. To convert from deg C to deg F, multiply the Celsius temperature by 9/5. That's the slope or coefficient of the relationship. But because the zero point on the Celsius scale is 32 on the Fahrenheit scale, we need to add 32 when converting. That's the intercept. 32 degrees F is like the replacement level temperature.
No matter how teams spend their available salary, they need to have 53 guys on their roster. At a bare minimum, they need to spend 53 * $min salary just to open the season. We can consider that amount analogous to the 32-degrees of Fahrenheit. For 2013, the minimum salaries ranged from $420k for rookies to $940k for 10-year veterans. To field a purely replacement level squad, a franchise could enlist nothing but rookies. But to add a bit of realism, let's throw in a good number of 1, 2, and 3-year veterans in the mix for a weighted average min salary of $500k per year. The league-wide total of potential replacement salary comes to:
Sun Tzu on Analytics
Apparently he was a fan:
Now the general who wins a battle makes many calculations in his temple ere the battle is fought. The general who loses a battle makes but few calculations beforehand. Thus do many calculations lead to victory, and few calculations to defeat: how much more no calculation at all! It is by attention to this point that I can foresee who is likely to win or lose.For those of you who may be unfamiliar with Sun Tzu, he was the ancient Chinese general and philosopher who wrote The Art of War. Still required reading around military academies and war colleges around the world to this day, The Art of War is perhaps the most influential military treatise in history. It crystallizes centuries of strategic wisdom into what are essentially tweet-sized chunks of timeless insight. Thus do many calculations lead to victory...I like that part. I think I'm gonna' put it on a t-shirt.
Podcast Episode 23 - John Urschel
John Urschel, professional football player and mathematician, joins the show. John was recently selected in the 5th round by the Baltimore Ravens in this year's NFL draft. Last season, John was a Penn State football co-captain and as a student-athlete he achieved a 4.0 GPA while majoring in math. For his efforts, he won the Campbell Trophy, awarded to the top scholar athlete in Division one football. He recently proved the Urschel-Zikatanov Generalized Bisection Theorem and published his findings in the Journal of Celestial Mechanics and Dynamical Astronomy.
On the show, John explains how he came to love the world of mathematics, and how his passion for football aligns with his commitment to academics. He also shares some of the lessons he's learned from veterans in OTAs and describes how he's preparing for the upcoming NFL season.
Subscribe on iTunes and Stitcher
Using Probabilistic Distributions to Quantify NFL Combine Performance
Jadeveon Clowney is thought of as a “once-in-a-decade” or even “once-in-a-generation” pass rushing talent by many. Once the top rated high school talent in the country, Clowney has retained that distinction through 3 years in college football’s most dominant conference. Super-talents like Clowney have traditionally been gambled on in the NFL draft with little idea of what future production is actually statistically anticipated. For all of the concerns over his work ethic, dedication, and professionalism, Clowney’s athleticism and potential have never been called into question. But is his athleticism actually that rare? And is his talent worth gambling millions of dollars and the 1st overall pick on? This article aims to objectify exactly how rare Jadeveon Clowney’s athleticism is in a historical sense.
Jadeveon Clowney set the NFL draft world on fire at this year’s combine when he delivered one of the most talked-about combine performances of recent memory, primarily driven by his blistering 40 yard dash time of 4.53. Over the years, however, I recall players like Vernon Gholston, Mario Williams, and even Ziggy Ansah displaying mind-boggling athleticism in drills. But if each year a player displays unseen athleticism at the combine, who is really impressive enough that we deem them “Once-in-a-decade?”
Probability Ranking allows me to identify the probability of encountering an athlete’s measurable. For instance, I probability ranked NFL combine 40 yard dash times for 341 defensive ends from 1999-2014 (Table 1 shows the top 50). In this case, Jadeveon Clowney’s 40 time of 4.53 had a probability rank of 99.12, meaning his speed is in the 99th percentile of all DEs over this time span.
NFL Prospect Evaluation using Quantile Regression
Casan Scott continues his guest series on evaluating NFL prospects through Principal Component Analysis. By day, Casan is a PhD candidate researching aquatic eco-toxicology at Baylor University.
Extraordinary amounts of data go into evaluation an NFL prospect. The NFL combine, pro days, college statistics, game tape breakdown, and even personality tests can all play a role in predicting a player’s future in the NFL. Jadeveon Clowney is arguably the most discussed prospect in the 2014 NFL draft, not named Johnny Manziel. He is certainly an elite prospect and potentially the best in this year’s draft, but he doesn’t appear to be a “once-in-a-decade” type of physical specimen based exclusively on historical combine performances. From the research I’ve done, only Mario Williams and JJ Watt can make such a claim. Super-talents like Clowney have traditionally been gambled on in the NFL draft with little idea of what future production is actually statistically anticipated. All prospects have a “ceiling” and a “floor” which represent the maximum and lowest potential that a prospect could realize respectively. But what does this “potential” mean and does it hold any importance for actually predicting a prospect’s success in the NFL? In this article I will show how Quantile Regression, a technique used by quantitative ecologists, can clarify what Clowney’s proverbial “ceiling” and “floor” may be in the NFL.
Athletes are a collection of numerous measured and unmeasured descriptor variables. Figure 1 shows a single predictor (40 yard dash time) vs a prospects’ Career NFL sacks + tackles for loss (TFL) per game.
Podcast Episode 22 - Brian Burke
Brian Burke returns to the show to recap the 2014 NFL draft. He describes the Bayesian Draft Analysis tool he created and discusses the value of trades made by teams during the draft. Brian and Dave then discuss their favorite new addition to the league, John Urschel, and make a pitch to get him to contribute to the site. Brian also previews his new project, WOPR, and explains how it'll help generate data for some previously unanswerable questions.
This episode of Advanced Football Analytics is brought to you by Harry's. Harry's delivers high-quality shave products straight to your door at a fraction of the price of shaving competitors. Go to Harrys.com and use the offer code "AFA" at checkout to save $5 off your first purchase.
Subscribe on iTunes and Stitcher
The AFA Draft Pick of the Year
Was the next Virgil Carter drafted yesterday? Penn State Guard John Urschel was taken with a compensatory pick in the 5th round by Baltimore. John stands out because he has an unusual plan for his time after his playing days are over. He says he's very interested in "sports analytics. Data analysis for football."
If he does, he'll analyze circles around the rest of us. While playing for PSU, John earned his degree in Math in just three years. Then added a masters degree in math, and is currently working on a second masters in math education. He's published research with names like Instabilities of the Sun-Jupiter-Asteroid Three Body Problem, A Space-Time Multigrid Method for the Numerical Valuation of Barrier Options, and Spectral Bisection of Graphs and Connectedness in which he proved the Urschel-Zikatanov Generalized Bisection Theorem. Man, I wish I had a theorem named after me.
To us, his most interesting research might be this article he wrote for ESPN The Magazine. He looked at "1) how best to predict a lineman's draft position, 2) that prospect's success in terms of NFL starts, and 3) whether a fringe prospect will be selected." Sounds like it would have made a good guest post here.
The Bayesian Draft Model estimated the most likely time Urshel would be taken was pick 167, not very far off from his actual selection at 175. The chance he would be available at 175 was 43% according to the numbers. So almost spot on. Interestingly, Urshel's own selection may have been the result of some sharp analytics. Baltimore is known to have "a proprietary formula—a “special sauce,” assistant GM Eric DeCosta calls it—that factors in potential compensatory picks to the free agency cost-benefit analysis."
Urschel would make a killer impact on the world of football analytics if he chose. However successful his pro career turns out, he'll carry the credibility of a pro-caliber player. Coaches will take what he has to say much more seriously than what an ex-Navy pilot writes on a website.
So, congratulations, John! I'll be rooting for you on the field and off. Play like a Raven!
Project WOPR is Coming
With the Bayesian draft tool completed, I can now focus on completing Project WOPR. For those who might be fans of mid-80s Matthew Broderick movies, you may have figured out what the WOPR is.
I'll give another clue:
It's purpose to answer the un-answerable questions of football strategy.
But for now, it's taking up my entire basement and has driven my electricity bill through the roof. The liquid-nitrogen cooled 32-core processors aren't cheap either.
Live Updates Tonight
As players are chosen, the probabilities will obviously start changing rapidly. The fact a player is off the board and no one else could fill that slot is information (with absolute certainty) that can be fed back through the model. The effects will cascade through the rest of the available picks.
Unfortunately, the interface won't update automatically for users. You'll need to click refresh or hit F5 after each pick. There will be at least two or three minutes of lag for the updates to work through the system, so be patient.
New Feature on the Draft Model
When I learned a little about object oriented programming, it all made sense. The software engineers were designing the interface for their own convenience, not for ease of use. It made sense from an efficiency standpoint...a programming efficiency standpoint. But from the perspective of the user, it wasn't so efficient. The least used feature was just as accessible as the most common feature, and all of them were hidden until you expanded the right portion of the tree.
Yesterday I realized I was doing the same thing with the draft model. From my point of view, it's easiest to think in terms of players and their probability to be selected at each pick number, because that's how the software that runs the model works. It goes down the list of prospects, player-by-player, looking at the probability he'll be selected pick#-by-pick#.
For the players and their agents, and for fans of particular players, this is ideal. They want to know where and when they'll go. But the user is probably thinking of things from a team's perspective. Whether the user is a team personnel guy or a fan of a team, he'd rather see things from the perspective of a pick #. Right now, a Vikings fan (or exec) would have to click through over a dozen or so of the top players to see who's likely to be available to them at pick #8. And if they were wondering about who'd be available if they trade up or down, that's another few dozen clicks. Scroll, click. Scroll, click...
Podcast Episode 21 - Cade Massey
Cade Massey, Professor of the Practice at the Wharton School of Business, joins the show to discuss his research on the NFL draft. Professor Massey is the co-author of "The Loser's Curse: Decision Making & Market Efficiency in the National Football League Draft", a paper analyzing the market for draft pick trades. He and his co-author, Richard Thaler, discovered that teams picking at the top of the draft actually sacrifice a great deal of what he calls "surplus value" by not trading down for additional selections.
Dave and Cade look at the reasons why teams employ less than optimal strategies, including risk aversion, adherence to norms established by "The Chart" and other psychological factors. Professor Massey defends his paper against critiques, and discusses why he believes the draft is such a compelling spectator event.
Subscribe on iTunes and Stitcher
Bayesian Draft Analysis Tool
For details on how the model works, please refer to these write-ups:
- A full description of the purpose and capabilities of the model
- A discussion of the theoretical basis of Bayesian inference as applied to draft modeling
- More details on the specific methodology
If you want to jump straight to the results, here they are. But I recommend reading a little further for a brief description of what you'll find.
The interface consists of a list of prospects and two primary charts. Selecting a prospect displays the probabilities of when he'll likely be taken. You can filter the selection list by overall ranking or position.
The top chart plots the probabilities the selected prospect will be taken at each pick #. I think this chart is pretty cool because it illustrates the Bayesian inference process. You can actually see the model 'learn' as it refines its estimates with the addition of each new projection. Where there is a firm consensus among experts, the probability distribution is tall and narrow, indicating high confidence. When there is disagreement, the distribution is low and wide, indicating low confidence.
The lower chart is the bottom line. It's the take-away. It depicts the cumulative probability that the selected prospect will remain available at each pick #. For example, currently there's an 82% chance safety HaHa Clinton-Nix is available at the #8 pick but only a 26% chance he's available at #14. A team with an eye on a specific player could use this information in deciding whether to trade up or down, and in understanding how far they'd need to trade.
Hovering your cursor over one of the bars on the chart provides some additional context, including which team has that pick and that team's primary needs (according to nfl.com).
The box in the upper right gives you the player's vitals - school, position, height, weight. The expert projections used as inputs to the model are also listed. Currently those include Kiper (ESPN), McShay (Scouts, Inc.), Pat Kirwan(CBS Sports), Daniel Jeremiah (former team scout, NFL Network), and Bucky Brooks (NFL Network). Experts were selected for their reputation, historical accuracy, and independence--that is, they don't all parrot the same projections. Not every prospect has a projection from each expert.
Link to the tool.
Bayesian Draft Model: More Methodology
The new Bayesian draft model is nearly ready for prime time. Before I launch the full tool publicly, I need to finish describing how it works. Previously, I described its purpose and general approach. And my most recent post described the theoretical underpinnings of Bayesian inference as applied to draft projections. This post will provide more detail on the model's empirical basis.
To review, the purpose of the model is to provide support for decisions. Teams considering trades need the best estimates possible about the likelihood of specific player availability at each pick number. Knowing player availability also plays an important role in deciding which positions to focus on in each round. Plus, it's fun for fans who follow the draft to see which prospects will likely be available to their teams. Hopefully, this tool sits at the intersection of Things helpful to teams and Things interesting to fans.
Since I went over the math in the previous post, I'll dig right into how the probability distributions that comprise the 'priors' and 'likelihoods' were derived.
I collected three sets of data from the last four drafts--best player rankings, expert draft projections (mock drafts), and actual draft selections. In a nutshell, to produce the prior distribution, I compared how close each player's consensus 'best-player' ranking was to his actual selection. And to produce the likelihood distributions I compared how close each player's actual selection was to the experts' mock projections.
Theoretical Explanation of the Bayesian Draft Model
First, some terminology. P(A) means the "probability of event A," as in the probability it rains in Seattle tomorrow. Event A is 'it rains in Seattle tomorrow'. Likewise, we can define P(B) as the probability that it rains in Seattle today.
P(A|B) means "the probability of event A given event B occurs," as in the probability that it rains in Seattle tomorrow given that it rained there today. This is known as a conditional probability.
The probability it rains in Seattle today and tomorrow can be calculated by P(A|B) * P(B), which should be fairly intuitive. I hope I haven't lost anyone.
It's also intuitive that "raining in Seattle today and tomorrow" is equivalent to "raining in Seattle tomorrow and today." There's no difference at all between those two things, and so there's no difference in their probabilities.
We can write out that equivalence, like this:
Bayesian Draft Prediction Model
I've created a tool for predicting when players will come off the board. This isn't a simple average of projections. Instead, it's a complete model based on the concept of Bayesian inference. Bayesian models have an uncanny knack for accurate projections if done properly. I won't go into the details of how Bayesian inference works in this post and save that for another article. This post is intended to illustrate the potential of this decision support tool.
Bayesian models begin with a 'prior' probability distribution, used as a reasonable first guess. Then that guess is refined as we add new information. It works the same way your brain does (hopefully). As more information is added, your prior belief is either confirmed or revised to some degree. The degree to which it is refined is a function of how reliable the new information is. This draft projection model works the same way.
Draft Prospect Evaluation Using Principal Component Analysis
A guest post by W. Casan Scott, Baylor University.
As different as ecology and the NFL sound, they share quite similar problems. The environment is an infinitely complex system with many known and unknown variables. The NFL is a perpetually changing landscape with a revolving door of players and schemes. Predicting an athlete’s performance pre-draft is complicated through a number of contributing variables including combine results, college production, intangibles, or how well that player fits a certain NFL scheme. Perhaps techniques that ecologists use to discern confounding trends in nature may be suitable for such challenges as the NFL draft. This article aims to introduce an eco-statistical tool, Principal Component Analysis (PCA), and its potential utility to advanced NFL analytics.
My Ph.D. research area is aquatic eco-toxicology, where I primarily model chemical exposure hazards to fish. So essentially, I use the best available data and methods to quantify how much danger a fish may be in, in a given habitat. Chemical exposures occur in infinitely complex mixtures across many different environments, and distinguishing trends from such dynamic situations is difficult.
Prospective draftees are actually similar (in theory) in that they are always a unique combination of their college team, inherent athleticism, history, intangibles, and even the current landscape in the NFL. The myriad of variables present in the environment and the NFL, both static and changing, make it difficult to separate the noise from actual, observable trends.
In environmental science, we sometimes use non-traditional methods to help us visualize what previously could not be observed. Likewise, Advanced NFL Analytics tries to answer questions that traditional methods cannot. The goal of this article is to educate others of the utility of eco-statistical tools, namely Principal Component Analysis (PCA), in assessing NFL draft prospects.
Wondering About the Wonderlic: Does It Predict Quarterback Performance?
New Site Title and New Address
I'm gradually updating all the banners and titles, but you'll still see ANS references for while as things progress. And the old url advancednflstats.com will redirect to the new address, hopefully just as soon as the update spreads to all the dns servers.
Thanks to everyone for sticking with AFA!
Coming Soon...
Project WOPR.
That's all I'm going to say.
Lacrosse Analytics
For those not familiar with lacrosse, imagine hockey played on a football field but, you know, with cleats instead of skates. And instead of a flat puck and flat sticks, there's a round ball and the sticks have small netted pocket to carry said ball. And instead of 3 periods, which must be some sort of weird French-Canadian socialist metric system thing, there's an even 4 quarters of play in lacrosse, just like God intended. But pretty much everything else is the same as hockey--face offs, goaltending, penalties & power plays. Lacrosse players tend to have more teeth though.
Because players carry the ball in their sticks rather than push it around on ice, possession tends to be more permanent than hockey. Lacrosse belongs to a class of sports I think of as "flow" sports. Soccer, hockey, lacrosse, field hockey, and to some degree basketball qualify. They are characterized by unbroken and continuous play, a ball loosely possessed by one team, and netted goals at either end of the field (or court). There are many variants of the basic team/ball/goal sport--for those of us old enough to remember the Goodwill Games of the 1980s, we have the dystopic sport of motoball burned into our brains. And for those of us (un)fortunate enough to attend the US Naval Academy (or the NY State penitentiary system) there's field ball. The interesting thing about these sports is that they can all be modeled the same way.
So with lacrosse season underway, I thought I'd take a detour from football work and make my contribution to lacrosse analytics. I built a parametric win probability model for lacrosse based on score, time, and possession. Here's how often a team can expect to win based on neutral possession--when there's a loose ball or immediately upon a faceoff following a previous score:
ANS Is Hiring!
The main responsibilities of the intern will include, but are not limited to the following:
*Support ANS team with data extractions, analysis, reporting, and data presentation
*Suggest new ways to improve existing processes
*Document and automate data and analytic processes
Required Qualifications
*Currently enrolled in, or a recent graduate of, a quantitative degree program such as Operations Research, Mathematics, Statistics, Economics, Computer Science, Information Science, Management Information Systems, or similar
*Senior, recent graduate or post-grad level
*Demonstrated attentiveness to detail
*Experience in analytical field
*Proficient in SPSS, SAS, mySQL, and/or R
*Proficient in Microsoft Office suite
Preferred Qualifications