Fumble Rate

The previous post proposed an efficiency stat for interceptions that helped reduce the cause-effect conflation created by interceptions. The interception efficiency stat calculates interceptions thrown per pass attempt. Evaluating interceptions this way reduces the impact of losing teams throwing often and predictably when behind. Although this new stat does not have the retrospective explanatory power of gross interceptions on winning, I believe it is superior to gross interceptions because it better isolates the direction of causation from the independent variable (interceptions) to the dependent variable (winning).

Evaluating fumbles in the same manner may also make sense, but the issue is more complex. First, a fumble can happen on nearly any play, not just runs. At first glance, if winning teams "run out the clock" and shy away from risky passes, it follows it would be at a higher risk for fumbles due to the higher number of running plays. But fumbles can happen during sacks or completions too. Second, there is also the question of fumbles lost vs. fumbles in general (lost or recoverd)--which should be used? Can the ratio of fumbles lost to fumbles tell us anything significant? So a fumble efficiency stat might not be a better measure of a team's tendency to fumble. For now, I'm going you use offensive fumbles per run+sacks+completions. Special teams fumbles are not counted.

To be honest, I'm not sure what the answers are. But here are the correlations with wins, offensive points scored, and defensive points allowed (0.15 for 5% significance):

What does everyone think?

My own initial thoughts: It looks like Off Fumble Rate (both lost and recovered) is the best to use for a win-regression model. I'm guessing this is because which team recovers a fumble is random, but fumbles in general approximate more purely a team's propensity to caugh up the ball. Plus, a recovered fumble usually aborts a play and costs a down and perhaps a loss of a couple yards.

Also, fumbles no matter how they are measured, tend to hurt an offense more than help the opposing team. Note the strong negative correlation of offensive fumble stats to offensive points scored vs. the weak or insignificant correlation with defensive points allowed (the top 4 rows).

Turnover "Efficiency"

The numbers show fairly conclusively that the best measures of passing and running proficiency are efficiency stats, i.e. yards per attempt. Efficiency stats correlate best with winning and "gross" stats such as total yards do not correlate well or at all.

But, as far as I know, no one has ever devised a turnover efficiency stat. Take David Carr, formerly of the Texans, for example. Everyone agrees that he threw a lot of interceptions, but why? Is it because he is a poor quarterback or is it because his team was very often behind and he was forced to throw very often to predictable routes? An interception rate stat can help answer that question.

Just like 'yards per pass attempt' is a better measure of passing proficiency than 'total passing yards,' so is 'interceptions per attempt' better than gross interceptions.

If I include gross interceptions in a regressing model, there is a strong inverse correlation between interceptions and winning. But we then run into the classic "correlation does not equal causation" fallacy. Because of the David Carr Effect, losing a game 'causes' interceptions to some degree.

In general, pass attempts are higher for losing teams than for winning teams. The team that is behind in a game will typically have a signficantly higher number of pass attempts than the winning team. In fact, pass attempts correlates negatively with wins at a weak but significant 0.17 coefficient.

Using interceptions per pass attempt instead of gross interceptions, therefore, helps reduce the backflow of causation in the correlation between wins and interceptions. Unfortunately, it also reduces the explanatory power of the model, but this is expected and actually good. It may not explain past game outcomes as well as using gross interceptions, but it may predict future outcomes better. Plus, turnover efficiency also helps prevent the causation backflow mentioned above as well as isolate the explanatory power of the other variables in the model.

Here is how the win-correlations break down for gross interceptions and interception efficiency:

Gross Int Thrown -0.51
Int/Pass Att -0.45

Gross Int Taken 0. 47
Int Taken/Pass Att 0.39

True Pass Efficiency

The standard measure of pass efficiency is made by dividing total passing yards by pass attempts. The NFL defines passing yards as yards gained (or lost) during all pass completions minus yards lossed in sacks. Pass attempts are defined as actual throws but do not include pass plays that result in sacks.

To date, my research shows conclusively that pass efficiency (yards per attempt) is the best measure of a team's ability to pass compared to any other. Other measures, such as total passing yards or completion percentage, simply do not correlate with winning as well as efficiency. The same is true on the defensive side of the ball.

But an even better efficiency stat, one that includes sacks as "pass attempts," should correlate with winning even better than the standard efficiency stat I've been using so far. Sacks measure a teams pass-blocking ability, quarterback mobility and vision, and the ability of receivers to get open--all part of a team's total passing proficiency.

By using data from the 2002-2006 NFL regular seasons, the correlations of the passing efficiency stats with winning are shown below. Both the standard efficiency and the improved efficiency (counting sacks as pass attempts) are listed for both offense and defense.

The result is better win correlations for both offensive and defensive "improved" pass efficiency calculations. This indicates that the game-by-game win-probability model I have been using would likely be improved by using the new pass efficiency stats.

Theory on Rushing (Non)Importance

Perhaps one reason why rush efficiency does not correlate well with winning is that we are using rush averages. Averages are not always the best indicator of central tendency. There are also mode and median.

Median yards (per rush) may correlate better with winning than average yards. Averages can be skewed to varying degrees by outlier observations, such as breaking runs for long gains. These long runs may be very random and not well-related to general running ability. The median would measure consistency and ignore outlier runs.

Perhaps it's better to consistently gain 4.5 yds/run 4 times than to get 18 yds, -2 yds, 0 yds, and then 2 yds.

Too bad median stats are generally not published. It would probably be necessary to calculate median stats game by game, and run by run.

Median pass efficiency stats may correlate better than average pass efficiency stats too.

More on the Importance of Running vs. Passing

In the last post we could see that turnovers were most strongly correlated with wins, followed in order by pass offense, pass defense, and run offense. Run defense still was not statistically significant but it at least had a negative sign, that is it implied a better run defense helps teams win, even if only slightly.

Below is a graph of how the various phases of the game correlate with season win totals. The table under the graph are the actual Pearson coefficients. This is not a regression, just a simple 2 variable correlation. The critical value for 5% significance is 0.37 for individual years (n=32) and 0.17 for the average of the 4 years (n=128).

The first thing that struck me is how steady the correlation of wins to turnovers and to offensive passing were. They also seem systematically related (which suggests there may be collinearity problems in a regression model). This makes sense intuitively because skilled passing offenses are less likely to throw interceptions. My hunch is that fumbles tend to be random and chaotic--not dependent on a team's running ability. But interceptions are likely tied to passing ability. Also notice that passing defense is apparently related loosely to both passing offense and turnovers. As passing offense and turnovers increase in importance, so does passing defense.

But the main point is the relative steadiness of each factor. Running, on both offense and defense, are unimportant in comparison to turnovers and passing offense. Offensive passing and turnovers can't really be any better correlated than they are at .70 and .55 respectively. Offensive points per game correlated at .71 and points allowed per game correlated at .70 (link). And even Dan Fouts can tell us how important points are in winning football games.

Expanding the Data

After disovering that a single year of NFL football did not provide enough data for singnificant coefficients for things such as run defense, I expanded the database to include teams and games from the years 2003-2006, 4 years in total. Instead of 32 teams worth of data, I now have 128.

Let's begin at looking at our new data by examining the correlation of our offensive and defensive efficiency variables with season wins.

Correlation Coefficients, 5% critical value (two-tailed) = 0.1736 for n = 128

The correlation between wins and defensive run efficiency (DRUN) is found to be negative, which makes sense. The fewer yards allowed per rush, the more wins a team tends to have. But although the direction of effect is as expected, it is still not significant.

All other variables are significant however. Turnovers correlate the strongest, followed by offensive pass efficiency, then defensive passing efficiency. The running game just doesn't seem to be that important. For some reason offensive running ability appears to be more important than defensive running ability.

What does any of this mean? It is more evidence of how important turnovers are. They should not be discounted such as the Rams attempted under Martz. Secondly, the passing game is significantly more important than the running game on both sides of the ball. If I'm a general manager with a choice between a great pass rusher and a great run stopper, I'd take the the pass rusher. A choice between a receiver and running back--take the receiver, unless the running back is also a great pass blocker.

I think this means the death of the "smashmouth" football team. Running only appears to be important, because teams that are ahead (and very likely to win) run out the clock, racking up run yardage a losing team won't get. While total rushing yards appear to correlate well with winning, yards per rush attempt tells a very different story.

But shouldn't running out the clock count for something? It allows teams to hold onto leads, and it should therefore contribute to wins. You might think so, but the numbers say no. It just doesn't matter that much, or at all.

Importance of Run Defense

After reviewing how well the model fit with the actual 2006 season and estimating the luck factor, I noticed that teams with strong run defenses were not well represented in the playoffs. This result is contrary to most conventional wisdom that emphasizes "stopping the run" as a key to winning.

The conventional wisdom makes sense. In a purely logical sense if a team were infinitely bad at stopping the run, its opponents would score a touchdown on every run attempt. So every incremental improvement over an "infinitely bad" run defense would have to improve a team's chance of winning.

But look at a 2006 ranking of run defense. The playoff teams are highlighted in yellow. The horizontal line is at the league average.

Notice that only 5 of the 12 playoff teams were above average while the other 7 are below average. Only 1 playoff team was in the top 7 in run defense. Additionally, 4 of the worst 6 run defenses made the playoffs including the absolute worst 2. Incredibly, the 2006 Super Bowl winner was the very worst--and not just by a little but by .39 yds/run, almost an entire standard deviation worse than the next best team.

To illustrate this another way, I've plotted wins against defensive run efficiency and added a regression line. Notice that as run defense gets worse, wins increase. This is backwards and obviously indicates a big problem in the data.

In fact, the simple linear regression illustrated above indicates that defensive run efficiency is not significant at all (p = 0.839). This is partially because of the small population of NFL teams, n=32.

This raises a larger problem with the model and its baseline data. The model's coefficients were drawn from the 2005 season only. In 2005, winning and run defense were correlated much better than 2006.

There were 256 games in 2005, which means there were 512 "game efforts" by all the teams. The model's significance calculations are based on the sample size, and n=512 seems like a healthy sample at first glance. But although there were 512 "game efforts" there only 32 different teams which creats a hybrid dataset in which n=512 and n=32.

The model can definitely be improved by adding more seasons of data to calculate the coefficients.

Predicting Playoff Races

Building upon the Total Probabilities calculations I did beginning after week 12 which forcast the probabilities of a team to finish with a total number of wins, I could then compare multiple teams competing for playoff spots.

I'll use Baltimore as an example again, because I follow the Ravens most closely. Following week 12, the Ravens were 9-2 and it was all but certain they would win the AFC North. But they were also in the hunt for a first-round bye in the playoffs. Indianapolis held the second seed and a bye with a 10-1 record. So to earn a bye, BAL had to at least beat out IND.

Since the model had calculated, based on a game-by-game assessment, the probabilities of each team finishing the season with each possible total number of wins, they could be compared side-by-side. Here is how both teams' outlooks appeared following week 12.

The probability of every possible combination of outcomes can be calculated by multiplying the individual component probabilities. For example, the probability of BAL and IND both finishing the season with 12 wins would be:

Prob(BAL 12, IND 12) = Prob(BAL 12) * Prob(IND 12) = .339 * .260 = .088

So there is about a 9% chance of that particular outcome. Every possible combination of outcomes is listed in the table below. BAL win totals are listed vertically on the left and IND win totals are listed horizontally on top. (Click to enlarge.)

The outcomes shaded in gray are ties, where both teams finish with equal numbers of wins. In this case BAL held the tie-breaker (better record vs. common opponents). The outcomes in green are those for which BAL finishes with more wins, and the outcomes in red are those for which IND finishes with more wins. It's quite a jumble of numbers, but there are really only 3 possible outcomes that matter: BAL wins, IND wins, or they tie. We can simply sum up all the green, red, and grey outcomes to finally produce the probabilities of each outcome.

P(BAL wins) = .279
P(Tie) = .278
P(IND wins) = .443

(Total = 1.000)

The table above is easier to understand if pictured graphically. Below is a plot of the probabilities of each outcome. BAL's wins are along the vertical axis and IND's are along the horizontal axis. The diagonal line plots the tie outcomes, where both teams finish with an equal number of wins. Plot "density" above the line indicates IND would have more wins, and "density" below the line indicates BAL would have more wins. Here, we see the "center of gravity" of the probability plot shows the mostly likely outcome is a tie at 13 wins for both teams. (And except for IND laying an egg at home vs. HOU, this would have been the real outcome.)

BAL actually seemed to have an advantage, because they owned the tie-breaker, and in fact, BAL ended the season 1 game ahead of IND to capture the second seed and a bye in the playoffs.

You can download the spreadsheet with several selected match-ups between playoff contenders here.

Assessing the Model's Accuracy

Probability models are difficult to assess by their nature. Linear models offer an R-squared that give a definitive assessment of the explanatory/predicitive power of a model. But probility models, such as the logit model I use, offer numerous indirect assessments but none is more straightforward the the % correct score. It tells us how well our model predicts actual outcomes.

If the model predicts outcomes well, then we know two things. First, we know how to predict games, which is fun. Second, we understand what is really important in winning NFL games and we have a deeper understanding of the inner-workings of the sport as it is played.

But % correct doesn't tell the whole story. It simply draws a line at .50 and if a team that is predicted to have a .51 win probability actually wins, we consider the model correct. If a team predicted to have a .49 win probability wins, the model is considered incorrect. But that's unfair. We expect to be wrong in 49% of all such cases. That's just the reality of equally matched teams. Further, the model is expected to be wrong in 20% of the games where it predicted the favorite team to have a .80 win probability. In fact, if the model were exactly 20% wrong in such games, it would mean the model is better than if it were 100% correct.

So to assess the model's accuracy, not just in terms of how often its predicted favorite actually won, but in terms of how accurate the predicted probabilities were, I produced the table below. I divided all the games into 5 categories based on the "lopsidedness" of the probability. Where the visiting team was forecast to have a win probability of between .00 and .20 (and the home team's win probability was between 1.00 and .80), I scored the models accuracy. I did the same for win probabilities between .21 and .40, between .41 and .60, between .61 and .80, and finally between .81 and 1.00.

If the model fits relatively well, its % correct scores should reflect the predicted probabilities of each category. So, for the 1st category, between .00 and .20, we'd expect a % correct score of approximately 90%. Here is how all 5 categories scored:

The model seems to fit well accross the spectrum of games. Locks, solid favorites, and toss-ups were accurately predicted by the model.

Total Probabilities

In the final stretch of the 2006 regular season I wanted to know which teams would make the playoffs. I thought that as long as my model was somewhat valid, it would give insight on how various teams would finish the season.

As opposed to my original linear model, which predicted the number of total season wins for each team based on to-date efficiency, my new game-by-game model takes matchups and homefield advantage into account. As the number of remaining games dwindle, these factors become more important.

Baltimore is my favorite team, so it was the first one I analyzed. With 5 games remaining in the season, the probability models (both the original (model 1) and alternate (model 2)) showed the following win probabilities for the Ravens.

The probabilities for each possible season outcome, i.e. total number of wins can be computed. For example, the probability that the Ravens would end the season by winning all 5 remaining games would be the product of all the individual win probabilities. At this point in the season it was:

Prob(5 wins) = .50 * .48 * .95 * .70 * .91 = .14

Every possible combination of game outcomes was calculated. There were 2^5 possible combinations of outcomes with 5 games remaining. Each combination that leads to 4 wins is summed to give the total probability of winning 4 of the 5 remaining games. The same is done for 3, 2, 1, and 0 wins. With a record of 9-2, the Ravens' resulting probabilities were:

The cumulative probabilities of winning "at least" X number of games is easily calculated by adding the probabilities of all the possible outcomes of X and greater than X. Baltimore's probabilities of winning at least X number of games was:

As luck would have it, Baltimore indeed finished the season with 13 wins, the number predicted most likely by the model.

These calculations were performed for all 32 teams. Once computed, various teams of interest can be compared. For example, who would win the AFC North? Simply compare BAL's and CIN's total win probabilities. Who would likely win homefield advantage in the playoffs? Compare BAL's and IND's total win predictions.

An Alternate Model

Midway through the 2006 regular season, I created an alternate game-by-game winner prediction model. It used the same stats, such as yards per rush, yards per pass attempt, turnovers, and home field advantage, but it used them in a different way.

I still used logit regression to compute the outcome probability of each game, but I experimented with different forms of each independent variable to see if I could improve the model's fit. I tried exponential and logarithmic versions of each variable, but predictive power was not improved.

I finally stumbled on an idea that produced a version of the model at least as predicitive as my original. I wondered if I could model the effect of a very strong running offense against a very weak run defense (and for passing, and vice versa). I theorized that if I could mathematically represent such an interaction, it might fit reality better than considering each team's efficiency stat alone. After all, this how it works in actual games--one team's offense interacts with its opponents defense. They're not performing independently in front of judges or for time.

Instead of using each team's yards per pass attemt/run, etc., I created variables that captured the interaction of an offense vs. a defense which I called PASSFACTOR AND RUNFACTOR. When Team A plays Team B, this is represented mathematically by:

APASSFACTOR = AOPASS * BDPASS (Team A's off pass eff x Team B's def pass eff)
ARUNFACTOR = AORUN * BDRUN (Team A's off run eff x Team B's def run eff)

BPASSFACTOR AND BRUNFACTOR are computed in the same way.

In this way, if a team with a great pass offense plays a team with a poor pass defense, the mismatch will be captured because PASSFACTOR will be very high for the great passing team.

The net turnovers and home field advantage variables remain the same. However, with a better database, it would be interesting to use the same technique with team A's giveaways vs. team B's takeaways and vice versa, or even going deeper by discerning fumbles and interceptions as separate variables. The random components of turnover stats may become too strong if we divide them up like that.

The new "matchup" model was nearly as predictive as the original, correctly predicting the outcome of only one fewer the 2005 games as well as the original (74.6% correct). All variables were significant. This was no breakthrough, obviously, but it is another tool to understand the game. And by adding more seasons of data, perhaps the model will improve.

Week by week, both models produced very similar probabilities, often only differing within .03 or so.