Advanced Football Analytics (formerly Advanced NFL Stats): Non-Linear Estimators

By Brian Burke

The 'season-win' regression models I've used are linear. They estimate the straight-line increase or decrease in season wins based on the variables used in the model. But in reality, not all phenomena work that way. Linear relationships are very common, but many natural relationships are non-linear, or exponential. Consider, for example, compound interest rates--If you deposit $100 in an account that earns 5% each year, you'd have $265 at the end of 20 years. At 10%, you'd have a lot more than double that, about $673.

To see if any of the stats I've been using are related exponentially to wins, I created new variables that are the squares of the passing/running/turnover efficiency stats. By including the squared variables in addition to the original linear version, we can see if there is a significant exponential relationship which might produce a model with a better fit.

For example, we already know that more defensive interceptions produce more wins. But if the squared variable for defensive interceptions is positive as well, then we'd know that even more interceptions produce wins at a faster rate (assuming a direction of causation between interceptions and wins). If the squared variable for defensive interceptions is negative, then we'd know that although interceptions produce wins, there is a diminishing return when a team accumulates a lot of them.

Offensive passing efficiency is consistently the strongest factor in the season win models, so I tried a regression including its squared variable first. The results are listed below (r-squared = 0.74).

The sqaured 'true' offensive passing efficiency variable (sq_TRUOPASS) is not significant, and the original linear version is no longer significant. By removing the squared variable, the model improves in several respects.

The complete model with all the efficiency stats, turnover stats, and penalities may be dividing up the variance of the dependent variable too finely, so I ran a simple model with TRUOPASS and sq_TRUOPASS only. The results are below (r-squared = 0.37).

The significance is a little stronger, with TRUOPASS (linear version) being significant at the 0.10 level. But sq_TRUOPASS does not become significant. Perhaps more seasons of data would provide significance, but it is unlikely to improve the estimation very much.

Had the coefficients been significant, we could construct an equation to estimate wins based on the simple non-linear passing model. This would be represented as follows:

WINS = const + B1 * TRUOPASS + B2 * TRUOPASS^2
WINS = -11.9 + 4.4 * TRUOPASS - 0.18 * TRUOPASS^2

For example, on the strength of their passing offenses alone, the '06 Ravens would be estimated to have 9.01 wins, and the Super Bowl teams Chicago and Indy would have 8.4 and 11.3 wins respectively.

I repeated inserting squared variables for each efficiency stat, and the results were very consistent--none were significant. In addition, I repeated the analysis using logarithmic versions of each variable. Again, they were not significant. The bottom line is that it seems that the model is best (and simplest) when using strictly linear variables. Although the results are consistently nonsignficant, there tends to be a 'diminishing-returns' effect to the extremes of increasing and decreasing performance stats. This makes sense, because teams are bounded by 0 and 16 wins.