I've previously commented that using 3rd down percentage in an analysis of team strength or a game prediction model is not a good practice. I realize this is counter intuitive. 3rd down percentage is highly correlated with winning, and unlike total rushing yards, the direction of causation is clear. Converting the always-critical 3rd down leads to winning. So why wouldn't it be a good stat?
"When we rate how good a team is, we're better off knowing how likely it is to win future games than dissecting past games into molecular detail."
3rd down percentage is a function of a team's passing ability, running ability, an opponent's ability to stop them, and often random  luck (you guess pass I call a draw). 3rd down percentage is an intermediate result between running/passing and the final result of interest--team wins. Injecting an intermediate result into a regression model may be useful in analyzing why teams won or lost past games, but it does not help evaluate how good a team is, or will be.
Bill Parcells once barked, "You are what your record says you are." I prefer the saying that "you're only as good as your next game." A team's record may be what matters when deciding who goes to the playoffs, but a team can't really change its record--except by winning or losing its next game. So when we rate how good a team is, we're better off knowing how likely it is to win future games than dissecting past games into molecular detail. Many of those details are unique to the circumstances of the past. Models like this are known as "over-fit."
Stats such as 3rd down percentage tell us more about what has happened to a team in the past than how well it will do in the future. In a recent article, I tested how well various stats endure through the season. If a team stat from the first half of the season does well predicting itself in the second half of the season, we have a good idea that it is an enduring and repeatable skill, and not primarily the result of randomness and non-repeating circumstances. The table below lists how well each team stat correlates with itself between the first and second half of a season.
Bill Parcells once barked, "You are what your record says you are." I prefer the saying that "you're only as good as your next game." A team's record may be what matters when deciding who goes to the playoffs, but a team can't really change its record--except by winning or losing its next game. So when we rate how good a team is, we're better off knowing how likely it is to win future games than dissecting past games into molecular detail. Many of those details are unique to the circumstances of the past. Models like this are known as "over-fit."
Stats such as 3rd down percentage tell us more about what has happened to a team in the past than how well it will do in the future. In a recent article, I tested how well various stats endure through the season. If a team stat from the first half of the season does well predicting itself in the second half of the season, we have a good idea that it is an enduring and repeatable skill, and not primarily the result of randomness and non-repeating circumstances. The table below lists how well each team stat correlates with itself between the first and second half of a season.
| Variable | Correlation | 
| O 3D Rate | 0.43 | 
| D Int Rate | 0.08 | 
| D Pass | 0.29 | 
| D Run | 0.44 | 
| D Sack Rate | 0.24 | 
| O Fumble Rate | 0.48 | 
| O Int Rate | 0.27 | 
| O Pass | 0.58 | 
| O Run | 0.56 | 
| O Sack Rate | 0.26 | 
| Penalty Rate | 0.58 | 
Offensive 3rd down rate endures fairly well within a season, with a correlation coefficient of 0.43. But what if I could predict a team's 3rd down percentage with a completely different stat better than past 3rd down percentage itself? What does that tell us about 3rd down percentage as a stat?
The table below lists other offensive efficiency stats as predictors of 3rd down percentage. In other words, these are the correlations between a team's other stats from the first half of a season and the team's 3rd down percentage from the second half of the same season.
| Predictor | Correlation | 
| O 3D Pct | 0.43 | 
| O Pass | 0.56 | 
| O Run | 0.08 | 
| O Sack Rate | -0.53 | 
| O Int Rate | -0.42 | 
We can actually predict a team's 3rd down percentage better with offensive pass efficiency, or with sack rate, better than with a team's to-date 3rd down percentage. And with the correlation with run efficiency at a very small 0.08, we see that the passing game has almost everything to do with 3rd down conversions. (Teams tend to pass on anything longer than 3rd and 1 these days.)
So why include 3rd down percentage in a rating of team strength or a win prediction model when passing stats are already included? It would only serve to add random noise. Instead of telling us how good a team is or will be, it would tell us more about the unique circumstances and random luck the team experienced in the past.
If we still want to use 3rd down percentage as a stat to predict how good a team will be, we can. After all, 3rd down success is critical in sustaining drives and scoring points. It correlates with team wins at about 0.49 and with points scored at 0.65, both relatively very high. Instead of actually using to-date 3rd down percentage, we should estimate what the 3rd down percentage will be based on the stats we know to be predictive.
The table below is a regression model using passing stats to estimate future 3rd down percentage.
| Variable | Coefficient | p-value | 
| constant | 39.6 | 0.00 | 
| O Sack Rate | -11.6 | 0.00 | 
| O Pass Efficiency | 1.29 | 0.01 | 
| O Int Rate | -1.53 | 0.00 | 
The actual model coefficients aren't as important as the fact that the r-squared is 0.94. That means that we can predict a teams's future 3rd down percentage with almost crystal ball-like accuracy using passing efficiency stats. And ironically, if we add previous 3rd down percentage itself to the model, it is the only non-significant variable (p=0.13) and r-squared is (strangely) reduced.
An r-squared of 0.94 is the equivalent of a correlation coefficient (r) of 0.97. Remember, this compares to the self-correlation of previous 3rd down percentage of only 0.46.
So if we want to know a team's ability to covert 3rd downs, we're far better off looking at passing stats than previous 3rd down conversion rates. And a prediction model is far better off using those passing stats (pass efficiency, interception rate, sack rate) and excluding to-date 3rd down percentage.
 
 







Have you considered looking at the first down to third down ratio?
Maybe something like 1d/3da or 1d - 3d failure.
Those would corrolate better with points.
They would include first downs made on downs 1 and 2.
also yards per third down attempts or yards per third down failures would corrallate pretty well with points.
Nice article, apparently 3D conversion is just too noisy to be used as a predictor for anything, including itself.
Are other stats better predicted without reference to themselves?
Are you using OLS for these regressions? Adding a variable should never decrease the R-squared.
It's actually adjusted r-squared, which corrects for the number of added variables.
Yes. I'll post how various stats can predict other stats in a future post.
Very nice analysis. You've made an extremely strong case that considering 3rd down conversion percentage as part of a model based on overall game stats is a bad idea.
It would be interesting to see if it holds true if we consider individual play results as oppose to aggregated stats. It's possible that when we consider the context (distance to the marker, and field position) of a given 3rd down, we remove enough noise that the conversion becomes a useful, non-redundant piece of information.
That said, I've always been bothered by how heavily DVOA considers the third down. FO seems at least peripherally aware of how misleading this may be, since they have talked extensively about the "third down rebound", where a team that is playing over their heads on third down tends to regress, and vice versa. One possible conclusion would be that if DVOA didn't weight third down as heavily, they wouldn't regress - they'd just be rated lower. It seems like this would improve year-to-year DVOA correlation, which is a stated goal in improving the algorithm.
Sometimes it seems like DVOA is torn between being descriptive and being predictive. Weighting 3rd downs more heavily would improve correlation to past wins, surely, but it probably introduces a bias in predicting future ones.
Tarr-Factoring in yard line and down and distance is what DVOA really is. I think they try to factor in time remaining and score difference too. So it would be easy enough to test your hypothesis by repeating the longitudinal auto-correlation analysis of Week 1-8 DVOA with Week 9-17 DVOA.
I don't think that analysis would effectively support or detract from my theory. As I understand it, DVOA basically:
1) calculates the expected points of down/distance before and after a given play, and
2) Normalizes this result against a league-average result at that down/distance.
The issue is that the impact on expected points is magnified on third down, simply because third down plays have a lot of leverage on the outcome of drives. So third down plays (both good and bad) form a disproportionately large part of the measure.
I have no doubt that week 1-8 DVOA is strongly correlated with week 9-17 DVOA. That's not really my point, and moreover any cobbling of stats is going to accomplish that much. I'm suggesting that the way DVOA is calculated is essentially increasing the error in the measurement, and the correlation could be made better.
My contention is that the correlation would improve if the weight of the third down plays were artificially reduced, such that they were not as much more important than first or second down. My suspicion is that third down could/should still have a slightly greater weight than first or second, simply because the skills that are valuable on third down are slightly different than first and second, and therefore are probably slightly more useful overall.
Unfortunately, I don't have access to the raw PBP data, or the down/distance expected points tables that are the foundation of DVOA. So I can't really test my hypothesis.
I guess I misunderstood your theory.
What if wk 1-8 DVOA and wk 9-17 DVOA aren't correlated very highly? The obvious teams would correlate, such as the Pats and Colts, but what about the majority of teams?
I think we're thinking along the same lines. One of my next projects is to adjust my model's coefficients in proportion to how noisy they are (after being adjusted for opponent). The other option is to discard the noisy stats altogether.
What if wk 1-8 DVOA and wk 9-17 DVOA aren't correlated very highly?
I would be fairly shocked if that were true, even for the lousy teams. This is because staying consistent (i.e. having good correlation with later measurements of the same stat) should be true even for stats that are actually poor measures of team strength.
Let me give an example. Let's say we believed all the cliches that the talking heads cite. With that in mind, we formulated a power ranking that was based on rushing yardage, opposing rushing yardage, and turnover differential. Virtually every clear-headed analysis of football statistics has shown that this is a crappy way to measure team strength. Nevertheless, I have a feeling that it would come out as pretty strongly self-correlated over two halves of the season. My point is that self-correlation, in and of itself, does not prove much about the quality of a stat.
For what it's worth, the preseason projections of DVOA have a .71 correlation with that season's DVOA. Given what a crapshoot preseason projections are, that's not bad. They don't list 1st-2nd half correlations anywhere, but the game-by-game DVOAs are in the premium database so I could calculate them if I wanted to.
I think we're thinking along the same lines. One of my next projects is to adjust my model's coefficients in proportion to how noisy they are (after being adjusted for opponent).
Yes, exactly. My point about DVOA is that they are (implicitly) weighting one source of data (3rd down performance) more heavily that two other sets of data (1st and second down performance), despite what I suspect are nearly equal levels of noise and nearly equal degrees of correlation with future team success. I think they would have a more consistent and meaningful measure if they tweaked the numbers to give roughly equal weight to results from all three downs (on a per-play basis).
The other option is to discard the noisy stats altogether.
This is, of course, a classic problem when constructing a model.