Predicting team win totals before the season begins is a very inexact science. Although I’ve predicted the win total of all the teams for 2007 based on last year’s performance stats, the estimates are fairly vague. The reason for the lack of confidence is that team performance in one area does not necessarily remain consistent across seasons. But we can measure which team performance predictors do tend to persist from year to year. The stats that endure as predictors of following year wins can be considered leading indicators.
I ran two regressions. The first was my usual model using efficiency stats to estimate team wins for the year in question. The second model used the same efficiency stats to estimate team wins for the next year. In other words, it used 2002 stats to predict 2003 wins. The data set included the ’02-’06 seasons. By comparing the results, we can see which stats tend to be consistent predictors from year to year.
The efficiency stat predictors were converted into standardized variables. This way, they can be directly compared to each other in terms of their relative importance in estimating wins. The % Persist column calculates the proportion of predictive power retained from one year to the next. It was calculated by dividing each coefficient of the next year model by the current year model, then adjusting for the r-squared of each regression.
Same Yr Wins | Next Yr Wins | |||||||
VARIABLE | COEF | SIG. | VARIABLE | COEF | SIG. | % Persist | ||
O Pass | 1.22 | 0.00 | O Pass | 0.42 | 0.11 | 9.4 | ||
D Pass | -1.11 | 0.00 | D Pass | -0.01 | 0.97 | 0.2 | ||
O Run | 0.42 | 0.00 | O Run | 0.56 | 0.01 | 36.0 * | ||
D Run | -0.16 | 0.23 | D Run | 0.00 | 0.99 | -0.5 | ||
Penalties | -0.23 | 0.07 | Penalties | -0.44 | 0.09 | 52.6 * | ||
O Fum | -0.42 | 0.01 | O Fum | -0.05 | 0.83 | 3.5 | ||
D FFum | 0.47 | 0.00 | D FFum | 0.78 | 0.00 | 45.3 * | ||
O Int | -0.32 | 0.03 | O Int | 0.40 | 0.07 | -34.2 ? | ||
D Int | 0.60 | 0.00 | D Int | -0.34 | 0.09 | -15.5 ? | ||
r-squared | 0.75 | r-squared | 0.20 |
The results of the first regression produced expected results. It estimated present year wins very well (r-squared=0.75), with all variables significant. The second regression, which predicted following year wins, was expectedly much weaker (r-squared=0.20), but it revealed which stats endure from year to year as predictors of team wins.
It shows that offensive run efficiency, team penalties, defensive forced fumbles, and interceptions thrown are relatively persistent predictors of following year wins. Defensive pass and run efficiencies are not consistent predictors.
I adjusted the coefficients in each model by their respective model’s r-squared values. Then I divided the second (next year) model’s coefficients by the first. This tells us the percent of predictive value of each stat that survives from one season to the next. In other words, I calculated how much of each stat’s predictive power survives the off-season to help predict next year’s wins. For example, only 9% of the predictive power of offensive pass efficiency endures.
We see that the stronger persisting stats are offensive running efficiency (36%), team penalties (52%), and defensive forced fumble rate (45%).
Notice that the interception rate stats also show persistence (45%, 34%), but that the signs of the coefficients are reversed between models. This means that these stats could be considered ANTI-predictors. In other words, a low offensive pass interception rate in one year signifies fewer wins the following year. This is unexpected and could be just due to their marginal significance. But although p-values of 0.07 and 0.09 may not good enough for the FDA to approve heart medication, it’s still extremely likely that the results signify something is at work.
My theory is that we are witnessing regression to the mean. For many teams, interception rates have a lot of variance due to luck. So a team that is unlucky with interceptions one year is not likely to be as unlucky the next year. That could partially explain the reversed signs. Another possibility is that teams systematically swing from high to low interception rates from one season to the next, something I strongly doubt. Otherwise, I’m at a loss to explain this result.
Examining the results as a whole, including the lack of persistence in defensive stats and the anti-prediction of interception stats, indicate that defensive performance, and secondary performance in particular, is not persistent from year-to-year as an indicator of team win totals. It is not reliable as an indicator of wins from season to season compared to other facets of the game.
Continue reading part 2 of this article.
With interception rates, I'm thinking it could be an adjustment in playcalling and coaching. With too many interceptions, they might skew towards really conservative playcalling the following year. With few, they might open up the playbook more willingly.
Another similar possiblity - high interception totals lead to a change of quarterbacks the next year. Or, high interception totals are common with first-year starters, who tend to improve the following year.
It would be interesting to look at, say, the top 10 examples from the last 5 seasons of (year 1 INTs)*(year 2 wins - year 1 wins). My guess is that we would see the pattern form.
tarr-I bet you're right. I've already put together the list you suggested and will have it posted tonight.
It's just extremely hard to repeat good interception rates on both offense and defense. So teams that come to win based on very favorable int stats fall flat the next year. The poster child for this effect is the '02-'03 Buccaneers.