Full Review of Game Theory Run-Pass Balance Study

A new paper on game theory and run-pass balance in the NFL, Professionals Do Not Play Minimax: Evidence from Major League Baseball and the National Football League, says that offenses run too often and play calling is too predictable. The authors, Kenneth Kovash and Steven Levitt, construct a success metric to value the outcome of NFL runs and passes from the 2002-2005 seasons. Then using regression models, they estimate and compare the values of a typical run and a typical pass. They also construct a regression to test if play calls can be predicted to any degree based on the previous play call.

Game Theory

Game theory tells us that in a 2-player zero-sum game, if both players are playing the optimum mix of strategies, the long-term average payoff from each strategy will be the same. If you have two general strategy options, like run or pass, you can’t just choose one of them all the time. That would make the defense’s job pretty easy. So you need some sort of unpredictable mix of strategies. The question is, what’s the optimum ratio?

If one option usually yields a higher payoff than the other, you would obviously like to do it more often. If running pays more than passing, a team should increase its proportion of running plays until the defense adjusts. At some point in the process, as the defense adjusts toward defending the run, passing becomes more lucrative. Ultimately, there is an equilibrium where both the offense and defense have adjusted to a mix of strategies and the average long-run payoff for each choice is equal. At equilibrium, the payoffs can’t be anything else than equal or else you’d want to increase the proportion of the more lucrative option.

The bottom line of the theory is that running and passing should have the same average payoffs. Now the question is how we define “payoff.” For game theory analysis to be valid, we need a strictly linear measure of payoffs called “utility.”


Money, for example, is usually not linear. If you already have $1 million in your pocket, an extra $100 isn’t going to make a difference to you. But if you don’t have a dime to your name, $100 is going to make a pretty big difference. To convert money into true utility, you need some sort of function that produces a linear output.

By ‘linear’ I mean something like this: Say you have $1,000 and you have the opportunity to wager it on a coin flip. To make it worth the risk, how much would you want if you win? If you say $2,000, your utility function of money is already linear. But me, I’m risk-averse, so I’d say $3,000. The utility of my first thousand dollars is worth the same as my next two thousand dollars. A linear utility function would need to take those preferences into account.

Utility also has to be consistent. In other words, you can't prefer apples to oranges, oranges to bananas, and bananas to apples. Often, utility is a function of time and situation. There may be times when you prefer apples most and other times when you prefer bananas most.

The Authors' Measure of Utility

Previous studies of run-pass balance ignored down and distance considerations. Kovash and Levitt solve the down and distance problem by creating a “success metric” similar to Expected Points. In fact, I think it may be superior to what I’ve been using as EP. The standard version of EP is the average expected net point advantage for a given down and distance situation. The difference is that the success metric used in the study looks at the difference in points at the end of the half, not just the point advantage between scores. For example, if an offense has a 1st and 10 at the opponent’s 10 yard line, that team can expect to score about 4 more points relative to the other team by the end of the current half. This method automatically accounts for the value of the ensuing kickoff, so no adjustment is needed. In the end, however, both systems appear to produce approximately the same values, and both systems account for important events like sacks and turnovers.

The paper’s success metric is a function of down, distance, and field position. Time and score are not included. This is a significant problem because it is not a consistent linear utility function. If a team is down by 14 points early in the 4th quarter, a long grinding field goal drive doesn’t help much. It's mostly just a waste of valuable minutes. Points do not equal utility in football.

The authors did make an attempt to remove the effect of time. They excluded all overtime plays, kneel downs, and all plays in the last 2 minutes of each half. However, these measures don’t account for the potentially large differences in the true value of a play considering time and score.

Example Situations

The value of a play isn't only the point advantage it leads to. Say it’s 3rd down and 4 at your own 40 yard line. Would you rather call a run or pass?

Well, that depends, you’d say. How much time is left? What’s the score? If my team is down by 7 points in the 4th quarter, I would lean more toward the pass. If I’m up by 7 points with 4 minutes left in the game, I’d strongly lean toward the run. If it’s the first quarter, I’d play it straight, sometimes running and sometimes passing.

Say you’re up by 3 points with 3 minutes remaining in the game. The other team has no timeouts. Like before, it’s 3rd and 4 at your own 40. What is the value of a 4-yard gain? What is the value of a 20-yard gain? I’d say they’re almost equal in this situation—either outcome allows you to effectively run out the clock and put the game out of reach. What if you were down by 3 instead of up by 3? In this case, the 20-yard gain is far more valuable.

When an offense has a lead, a large part of the value of a run is in the additional time it consumes compared to a pass. Incomplete passes stop the clock, but a run is virtually guaranteed not to stop it. Additionally, the added risk of an interception is highly sensitive the game's current score/time situation. The utility value of a play needs to be a function of time and score as well as down, distance, and field position.

Without those considerations, the study's success measure violates both requirements for utilities in game theory--linearity and consistency. Early in the game I'd always prefer 4 expected points to 3 expected points. But if my team has a lead late in the game, I'd gladly prefer 3 expected points and time off the clock to 4 expected points and a stopped clock. That's critical because this trade off is exactly how the nature of runs and passes differ.


The authors constructed a regression using their success metric as the outcome variable. The regression estimates the value of plays based on whether the play was a pass or run, plus several different control variables. Here, the authors go the extra mile and include controls for every single offense, defense, and interactions between each offense and defense. Variables for temperature, turf type, and home field were also included. They also included control variables for score and game time remaining. The regression results show that the typical pass yields about 0.07 more points than the typical run, which is a statistically significant difference. The authors conclude that coaches are not calling plays optimally.

So what’s the problem? Didn’t I just claim they didn’t account for time and score? Well, they "accounted" for score and time, but they did it the wrong way and at the wrong point in the process.

In a regression, control variables can “account” for variance in certain things, like score or time remaining, by nibbling at some of the variance in the outcome variable. This will tweak the  coefficient of the variable of interest, such as whether a play was a run or pass. But a  regression control variable cannot go back and revise the initial payoff function so it includes time and score considerations. These factors need to be included in the original payoff function prior to using regression. So if you ask whether the study “accounts for” time and score, you could honestly say yes. But this doesn’t mean that it really does.


I’m not saying that this means we should throw out the entire study. It’s salvageable by doing one of two things. The easiest fix would be to narrow the scope of the data to just first quarter plays and to when the score is close. I would say anything up to a 10-point difference still allows teams to play a game of “point maximization” without deviating from their basic game plans. Additionally, time is never really a factor in the first quarter. This is what I usually do with Expected Point analysis. The Romer 4th Down study limited data to the first quarter, but I don’t think it considered score difference.

The second possibility is to use a true linear utility function. The measure of success of each play would be its change in Win Probability (WP)—the chance a team will win the game given its current state. WP is a function of down, distance, field position, time, and score. Winning is what is ultimately valuable. Teams don’t really care if they win by 1 or by 20 points as long as they win. WP is consistently linear throughout, so it’s a valid utility function. For example, 0.60 WP is exactly twice as good as 0.30 WP, and you'd always prefer a higher WP to a lower WP.

If the authors did either of those two things, I think their results would be improved. I happen to have completed a similar analysis based on my WP model several weeks ago (without all the regression controls). I am currently revising the overall model, so I’m waiting before I post anything firm, but I can say that the preliminary results indicate a general advantage for passes over runs.

Predictability in Play Calling

The second part of the football half of the study looks at whether play calls are serially correlated. Game theory tells us that for a strategy mix to be most effective, it must be unpredictable. If it’s not, an opponent can adjust its own strategy to counter the anticipated strategy choice.

Kovash and Levitt construct another regression where the outcome variable is whether a play was a pass or not. One of the predictor variables was whether or not the previous play was a pass. The regression coefficient of this variable represents the serial correlation of play calls.

Three other control variables were also included—the particular offense’s run-pass balance over the course of the season, the particular defense’s run-pass balance over the course of the season, and the percentage of passes by the offense so far in the current game.

The authors find that there is “substantial negative correlation,” which means there is a tendency to alternate between pass and run. Ideally, if play calling were truly unpredictable, there would be no correlation. If the previous play was a pass, the defense could gain an advantage by biasing more against the run than it otherwise would.

This result is consistent with observations by both Doug Drinen of Pro-Football-Reference and myself. It was particularly interesting that the tendency to alternate was most pronounced when the previous play was unsuccessful.

My only suggestion here is to check successive 1st down and 10 situations for a tendency to alternate. If a team has passed on the previous 1st down, I'd bet they're more likely to run on the next one, regardless of the plays in between. I would think that coordinators have a goal in mind for play balance, and they probably tend to alternate so they never stray to far from the target proportion.


Overall, I think this study's flaws are recoverable. The theoretical foundation is strong and it asks all the right questions. The only problem is that the measure of utility the authors chose is not a valid utility function, a requirement of game theory analysis. A true measure of the value of a play must consider score, time remaining in the game, and the time consumed by the play.

  • Spread The Love
  • Digg This Post
  • Tweet This Post
  • Stumble This Post
  • Submit This Post To Delicious
  • Submit This Post To Reddit
  • Submit This Post To Mixx

11 Responses to “Full Review of Game Theory Run-Pass Balance Study”

  1. Guy says:

    Brian: Very nice analysis. I'll be interested in seeing your study when it's completed.

    Question on the play predictability issue: is it possible that running backs and/or receivers perform slightly less well when asked to run two or more successive plays? That is, could there be a small improvement in performance if you give your best backs and receivers a rest sometimes, even if minimax calls for you to give them the ball for a 3rd or 4th consecutive play?

    Keep up the great work......

  2. Ryan says:

    It seems logical to me that playcallers would alternate more after unsuccessful plays, even if it's not necessarily maximizing a team's advantage.

    Consider a 2nd and 10, where a team decides to run the ball. If a run is successful, the RB might gain a first down, giving the team more flexibility to run the ball again. If he's stopped for a loss, they're put in a 3rd-and-long "passing situation." An incomplete pass on 1st down, too, will result in a 3rd and long if the 2nd down play is also unsuccessful, so it seems that a run on 2nd down would make 3rd down more managable. No playcaller wants to go 3 and out and not gain any yards.

    On the other hand, a successful play might highlight certain weaknesses is a defense, whether it's a hole in the line or a gap in coverage, an exploitable secondary, etc, that could be taken advantage of. In other words, if something is working, don't change it until the defense adjusts.

    Not saying any of this is logical from a statistical standpoint, just trying to understand what goes through a coach's mind when calling plays.

  3. Anonymous says:

    Whenever someone writes about play-calling in the N.F.L., I sometimes think about the conversation Tricky Dick had with Henry Kissinger. It was Nixon, if you remember, who once sent then-Redskins coach George Allen some plays for his squad to use that Sunday.

    "I think it's fascinating," Nixon reportedly told Kissinger, then the secretary of state. "It's quite a science."

    To which old Henry replied, "What's the mystery? It's either run or pass."

    John M. Sweeney

  4. Will says:

    What journal did they publish in? You might consider sending a condensed version of this to the letters section of that journal. It's possible the academics aren't reading this blog, though they should, and you might find some collaborators.

  5. Phil Birnbaum says:

    Like Guy said: is it not likely that after a 20 yard running gain, the best RB is tired and you have to do something else? I've seen them head to the sidelines after a play like that. And the guy who has to sprint 40 yards down the field on a passing play even if he never gets the ball ... well, you don't want him to do that twice in a row, do you?

  6. Jim Glass says:

    The authors find that there is “substantial negative correlation"...

    Just a thought:

    Looking at offensive play selection by itself neglects the influence that defensive calls have on offensive calls.

    E.g.: Coaches call "run", QB walks to the line, sees eight in the box, audibles to "pass".

    In principle the initial O run-pass choices could be made by a random number generator, but if the D calls aren't random for some reason (good or bad) the final O-play choices won't be either.


  7. Stan says:

    The biggest problem here is the foolish assumption that the offense's goal is to maximize the yardage on a particular play and that the defensive objective is to minimize the yardage on that play. It's not, not even close. The offensive objective is to move the chains. The defensive objective is to get the ball back without giving up a score. The interaction of those objectives profoundly changes the dynamics of play-calling.

    On offense, I'd rather have an offense that got 4 yards every play than one that averaged 10 yards a play. Consistent efficiency on offense has a great deal more value in winning a game than inconsistent big plays. Similarly, defenses would rather create one big negative play or penalty than worry about miminizing the gain on any particular play.

    In other words, it ain't about the individual play. It's about the entire possession.

  8. Brian Burke says:

    Stan-I think you misunderstand. Those considerations are largely accounted for by the study. They value each play by the score advantage the result gives the offense. A 3 yd gain on 3rd and 2 is valued higher than a 3 yd gain on 2nd and 10.

    In the study's model, consistency as you describe would be valued appropriately.

  9. James says:

    Tangotiger had a link to the paper last week if anyone wants it.

    One thing that struck me was how much more productive passing was than rushing using their expected points style measure. But I think that by excluding qb rushes they may have missed part of the gain of rushes. I think QB rushes may be more productive than RB rushes because they tend to be either Qb sneaks on 3rd and short or plays where the QB drops back but the defense have covered all recievers but allowig the QB to run.

    They foucsed on the game theory aspect of play calling by coaches but this continues at the level of the player as well.

    Also their graph didn't include inside both 10 yard lines which romer showed were very different.

  10. Steven says:

    Hi Brian,

    Had a question regarding this bit "For example, 0.60 WP is exactly twice as good as 0.30 WP" - are there not situations where external factors (such as Playoffs considerations) will affect this... ie if a team has 2 remaining games, and winning both guarantee it a Playoffs berth, but but losing either would deny it the berth... then 0.3*0.3 = 0.09 would be their chances of making the Playoffs, whereas 0.6*0.6 = 0.36 would be their chances with a 60% winrate per game... so doubling their winrate per game quadruples their Playoffs chances.

    A bit pedantic, I know, and obviously in isolation for any theoretical game, it's perfectly linear.

    Love the site, am working my way through the articles.

  11. Anonymous says:

    In very simple terms, could you answer this question. Does good running efficiency increase passing efficiency? I have read lots of your stuff but still couldn't seem to find an answer to this question.

Leave a Reply

Note: Only a member of this blog may post a comment.