Signal vs. Noise in Football Stats

In 2007, the Detroit Lion defense began the first half of the season with 13 interceptions, the most in the NFL. The next best teams had 11. It's reasonable to expect that the Lions would tend to continue to generate high numbers of interceptions through the rest of the season, notwithstanding calamitous injuries.

I wouldn't expect them to necessarily continue to be #1 in the league, but I'd expect them to be near the top. And I'd be wrong. It turns out they only had 4 interceptions in their final 8 games, ranking dead last. So halfway through the season, if I were trying to estimate how good the Lions are in terms of how likely they are to win future games, I might be better off ignoring defensive interceptions.

Although turnovers are critical in explaining the outcomes of NFL games, defensive interceptions are nearly all noise and no signal. Over the past two years, defensive interceptions from the first half to the second half of a season correlate at only 0.08. In comparison, offensive interceptions correlate at 0.27. As important as interceptions are in winning, a prediction model should actually ignore a team's past record of defensive interceptions.

You might say that if defensive interception stats are adjusted for opponents' interceptions thrown, then the correlation would be slightly higher. I'd agree--but that's the point. Interceptions have everything to do with who is throwing, and almost nothing to do with the defense.

This may be important for a couple reasons. First, our estimations of how good a defense is should no longer rest on how many interceptions they generate. Second, interception stats are probably overvalued when rating pass defenders, both free-agents and draft prospects.

I've made this point about interceptions before when I looked at intra-season auto-correlations of various team stats. That's a fancy way of saying how consistent is a stat with itself during the course of a season. The more consistent a stat is, the more likely it is due to a repeatable skill or ability. The less consistent it is, the more likely the stat is due to unique circumstances or merely random luck.

The table below lists various team stats and their self-correlation, i.e. how well they correlate between the first half and second half of a season. The higher the correlation, the more consistent the stat and the more it is a repeatable skill useful for predicting future performance. The lower the correlation, the more it is due to randomness.
















VariableCorrelation
D Int Rate0.08
D Pass0.29
D Run0.44
D Sack Rate0.24
O 3D Rate0.43
O Fumble Rate0.48
O Int Rate0.27
O Pass0.58
O Run0.56
O Sack Rate0.26
Penalty Rate0.58

In a related post, I made the case that although 3rd down percentage tended to be consistent during a season (0.43 auto-correlation), other stats such as offensive pass efficiency and sack rate were even more predictive of 3rd down percentage. In other words, first-half-season pass efficiency predicted second-half-season 3rd down percentage better than first-half-season 3rd down percentage itself.

But what about other stats? Are there other examples where another stat is more predictive of of something than that something itself? Below is a table of various team stats from the second half of a season and how well they are predicted by other stats from the first half of a season.

For example, take offensive interception rates (O Int). Offensive sack rates (O Sack) from the first 8 games of a season actually predict offensive interception rates from the following 8 games slightly better than offensive interception rates (0.28 vs. 0.27).








































PredictingWithCorrelation
D FumD Fum0.33
D FumD Sack 0.15
D FumD Run0.12
D Int D Sack 0.08
D Int D Int 0.08
D Int D Pass0.01
D PassD Pass0.28
D PassD Sack 0.26
D RunD Run0.44
D Sack D Sack 0.24
D Sack D Pass-0.07
O 3D PctO Sack -0.53
O 3D PctO 3D Pct0.43
O 3D PctO Int -0.42
O 3D PctO Pass0.42
O 3D PctO Run0.08
O FumO Fum0.48
O FumO Sack 0.24
O Int O Sack 0.28
O Int O Int 0.27
O Int O Run0.06
O Int O Pass-0.37
O PassO Pass0.49
O PassO Sack -0.33
O PassO Run-0.10
O RunO Run0.56
O RunO Pass0.00
O Sack O Pass-0.40
O Sack O Sack 0.26
O Sack O Run0.03
PenPen0.58
PenD Pass-0.23
PenO Sack -0.08


There are a thousand observations from this table. I still see new and interesting implications whenever I look it over.
  • Having a potent running game does not prevent sacks.
  • The pass rush predicts defensive pass efficiency as well as defensive pass efficiency itself.
  • Running does not "set up" the pass, and passing does not "set up" the run. They are likely independent abilities.
  • Offensive sack rates are much better predicted by offensive passing ability than previous sack rates.
  • Defensive sack rate predicts defensive passing efficiency, but defensive passing efficiency does not predict sack rate.
We see that many stats, such as passing and running efficiency predict themselves fairly well. But even those stats might be better predicted by using a combination of themselves and related stats. For example, in my previous post I noted how accurately offensive 3rd down percentage could be predicted using passing efficiency, sack rate, and interception rate.

The implications of these auto-correlations are numerous. Team "power" rankings and game predictions (both straight-up and against the spread) rely on a very simple premise--past performance predicts future performance. We now know that's not necessarily true for some aspects of football.

Lions head coach Rod Marinelli might be banging his head against the wall trying to understand how his defense was able to grab 13 interceptions through game 8, but only 4 more for the rest of the season. He's wasting his time. The answer is that in the first half of the season, the Lions played against QBs Josh McCown (2 Ints), Tavaris Jackson (4 Ints), and Brian Griese twice (4, 3 Ints).

Safe Leads in NCAA Basketball


Bill James takes a look at when leads become insurmountable in college basketball. In other words, when should CBS cut away from the UNC-Mt. Saint Mary's game to show us the barn-burner between Vanderbilt and Siena?

James' formula uses the lead in points, who has the ball, and seconds remaining to tell us if the lead is completely insurmountable. Here it is in a nutshell:

  • x= (Lead - 3 +/- .5) 2 -- [+.5 if winning team has possession, -.5 if not]
  • If x > time remaining in sec, the lead is insurmountable
Pretty cool. This is the kind of thing James is really good at. Unfortunately, I think he buys into a logical fallacy later in his article. He says that if a team is deemed to be "dead," that is to say too far behind, but it is able to climb back inside the limits of "insurmountability," it doesn't matter. The losing team is still dead.

I'd agree that it is highly unlikely that such a team would win, but I think James has been taken in by the gambler's fallacy. He writes "The theory of a safe lead is that to overcome it requires a series of events so improbable as to be essentially impossible. If the "dead" team pulls back over the safety line, that just means that they got some part of the impossible sequence—not that they have a meaningful chance to run the whole thing."

It seems to me that if a team climbs back into contention, it's in contention. If a sequence of events are independent, it doesn't matter how lucky or how impossible previous events were. They're water under the bridge. For example, (from Wikipedia) the probability of flipping 21 heads in a row, with a fair coin is 1 in 2,097,152, but the probability of flipping a head after having already flipped 20 heads in a row is simply 0.5

The only thing that matters is the current situation. It's like saying, "There's no way they'll hit another 3-pointer. They just hit five in a row. They're due to miss."

What does this have to do with football? It would be interesting to look at something similar in the NFL. When is a lead so safe that a team should stop throwing? Or when is it so safe a team should only throw on 3rd down? And so on. Basically, when should a winning team stop trying to gain a bigger lead and start trying to simply prevent big mistakes?

The Office Pool 3

You might be wondering why I'm interested in NFL pick 'em pools in the middle of March Madness. Well, there are already plenty of statistical analyses on the NCAA basketball tournament. Here are a couple of sites to get your bracket filled out scientifically.

But for now, I've got the luxury of time before the NFL season starts, or even draft season, when I can think through these things. In the last post I looked at using the point spreads as a baseline for picking winners. I looked at the accuracy with which the point spread correctly favored straight-up winners. Over the past six seasons, the spread was accurate about 67% of the time, and no single week showed any statistically significant deviation from the overall average. In other words, the spread is no more or less accurate in early or late weeks than throughout the season.

The reason I analyzed spread accuracy by week was because when you're behind in a pick 'em pool, you'll probably have to gamble on some upsets in order to catch up. I wanted to know if it was to your benefit to go against spread favorites in any particular week. The answer is no.

But what about spread amounts? It certainly makes sense that games with +1 or -1 spreads will be less predictable than games with +14 or -14 spreads. But how much less predictable? Is there a point of inflexion when it never makes sense to go against the spread? Are there situations when it's basically a toss up and the spread is no more accurate the flip of a coin?

Below are the answers. The graph shows the accuracy of the spread in terms of predicting the straight-up winner for each spread amount. Data is from all regular season games from 2002-2007.


As expected, the spread predicts winners more accurately with increasing spread amounts. Note the fan-shaped dispersion of the data points. Some spread amounts are far less common than others. For example an 8-point spread is less common than a 7-point spread. At the least common spread amounts, there are fewer cases and therefore a wider range of accuracies.

Also notice how games with spreads less than 3 points are no better than 50% accurate. I wouldn't expect much better than 50% - 55%, but less than 50% is surprising.

The 'home underdog' phenomenon has been established in previous research. This may be due to many observers underestimating the home field advantage due to weather conditions late in the season. But whatever the reason, the home underdog effect clearly exists. The graph below breaks up the spreads into home underdogs and home favorites.


Notice the 40% accuracy of low-spread home underdog games. When the spread is below +2.5 points, the home underdog is not only likely to cover, but will probably win.

It makes sense to pick upsets in low-spread home underdog games. However, there have only been 70 such cases in the past 6 years, averaging about 11 games per year. If you go for the underdog in all 11, on average that gives you a 2-game edge over someone picking all favorites. It's not nearly enough of an advantage to guarantee you bragging rights around the office, but every little bit helps.

And if you need to catch up in the later weeks, games with low spreads and home favorites aren't much better than 50%. Going with upsets in such games would make sense because it's not a bad gamble and your leading opponent might be playing it safe by picking all favorites.

You'll want to be very careful picking against the favorite in games with spreads more than 3.5 points, for either home underdog or home favorite games. The odds against you climb rapidly beyond that point.

The Office Pool 2

When picking winners in an office pool, I'd guess that most people start with the point spreads, or at least look at the records of each opponent, when making their predictions. Most people have some sort of baseline even if it's not Sagarin, DVOA, or the regression-based predictions on this site.

So I thought it would be interesting to look at the spreads, and how often they're correct in identifying winners. If someone needs to correctly pick a few upsets to win a pool, it might be good to know that some weeks are less predictable than others. You'd ideally want to pick upsets in weeks where the spread is less accurate.

In this installment I'll look at how well the spread does in picking winners by week. My theory was that the spread would be relatively less accurate in the early weeks of the the season when there is less information about team performance. There may also be a high degree of bias towards teams expected to be strong in the pre-season. Week 17 may be inaccurate too, due to the uncertainty of some playoff-bound teams resting their better players. Additionally, late season weather may also contribute to higher uncertainty and less predictability.

Using point spread data from the 2002-2007 seasons obtained here, I analyzed how often the spread was correct. Overall, the point spread favorites win 66.2% of the time. Weekly accuracy ranges from 59.0% in weeks 4 and 9 to 72.6% in week 12. The graph and table below list the weekly averages.

























WeekAccuracy
163.8%
262.5%
369.8%
459.0%
571.3%
669.5%
761.4%
864.6%
959.0%
1061.6%
1176.3%
1272.6%
1362.8%
1471.6%
1569.5%
1663.5%
1765.3%
Total66.2%


Although there appears to be substantial differences between some weeks, they are most likely random. The only statistically significant difference between any one week and the season average of 66.2% was week 12 with p=0.04. However, there are 17 weeks, so we should not be surprised to see a week or even two appear significant when there really is no systematic connection (a type I error).

The bottom line here is that no week can be viewed as particularly favorable for picking upsets. If you're behind in your office pool toward the end of the season and need to pick some upsets to make up ground, one week is as good as any other to start getting aggressive.

Next, I'll look at point spreads from a different angle and see how accurate they are at picking winners by the size of the spread. I'll also break it down into two types of games, home-underdogs and home-favorites, to see if there are any inefficiencies in how the spread accounts for home field advantage.

The Office Pool 1

Say I'm in an office pool pick 'em contest. My 10 buddies and I pick NFL winners each week, and the guy with the best record at the end of the season wins. My office mates aren't particularly good at handicapping football games, so I figure that if I pick the consensus favorite in every game (the team favored by the spread), I'll have a great chance to come out on top over the long haul.

My office buddies have access to point spreads too. They tend to look at the spread (or at least look at each team's respective record, which is just as accurate) and then pick a couple upsets each week. Over the past five years, the spread identifies winners correctly 66.2% of the time. So, normally, their upsets would be correct on average 33.8% of the time (100%-66.2%), but they won't be picking upsets in lopsided match-ups. (Although we can't always assume rationality, we will assume sanity.) So in their 34 chosen upsets (2 per week), my buddies will be right 45% of the time on average. I would have a 10% accuracy advantage in those games.

After doing the math, each of my buddies would average a 63.3% accuracy rate (66.2% * 232 games + 45% * 34 games). And I'd average 66.2% accuracy. Man, I can't wait to collect my winnings!

But wait. Because of luck, some would be slightly more accurate, and some would be less accurate. In fact, the only thing that really matters is how well each of them do on the 34 games they deviate from picking the published favorite. In the other 232 games, we'd have identical picks. Of the 34 games in question, each game that one of my buddies gets right, I must have been been wrong. One of my 10 friends needs to be correct greater than 50% of the time in his 34 games to beat me.

The mathematical bottom line is, "How often is someone correct at least in 18 out of 34 trials with a 0.45 probability of being correct in any given trial?" The binomial distribution gives us the answer--it's 22.4% of the time. That's pretty good, right? I have a 77.6% chance of beating any one of my opponents. The only problem is that there are 10 of them.

The chance I would beat all 10 of my buddies is the conjunctive probability of beating one of them. It's 77.6% * 77.6%... and so on, for however many opponents I have. In this case, it's:

0.776 10 = 0.079

In other words, my chances of winning the office pool are just 7.9%--significantly less than a fair chance of 1 in 11. That's why just picking the favorites is a bad strategy. I'd actually be better off choosing the less accurate strategy of my buddies. At least then I'd have fair chance at 1 in 11.

I realize that it is counter-intuitive that a strategy that is less accurate overall is better than a more accurate strategy. But in a contest against several opponents, the more risky strategy--with a greater deviation of outcomes--may be best.

Note: Phil Birmbaum points out that the odds of the opponents are not independent of one another, and therefore the simple compound probability I calculated here is far too low. If one opponent happens to beat you, then the other opponents may be more likely to beat you as well, and vice versa. In the end, picking all favorites may be the better play. See his comments for an explanation.

Going for It on Fourth Down

It's 4th down and goal from the 2-yard line in the first quarter. What would most coaches do? Easy, they'd kick the field goal, a virtually certain 3 points.

But a 4th and goal from the 2 is successful about 3 out of 7 times, assuring the same number of expected points, on average, as the field goal. Plus, if the attempt at a touchdown is unsuccessful the opponent is left with the ball on the 2 or even 1 yard line. And if the field goal is successful, the opponent returns a kickoff which leaves them usually around the 28-yard line. It should be obvious that on balance, going for the touchdown is the better decision.

That's the case made by economist David Romer, author of a 2005 paper called "Do Firms Maximize, Evidence from Professional Football." Romer's paper is an analysis of 4th down situations in the NFL. It is quite possibly the most definitive proof that coaches are too timid on 4th down. Romer's theory is that coaches don't try to maximize their team's chances of winning games as much as they maximize their job security.

Coaches know that if they follow conventional wisdom and kick--oh well, the players just didn't make it happen. But if they take a risk and lose, even if it is on balance the better decision, they'll be Monday morning quarterbacked to death. Or at least their job security will be put in question.

In case anyone doubts how much coaches are concerned about Monday morning criticism, just take their word for it. Down by 3 points very late in the 4th quarter against the winless and fatigued Dolphin defense, former Ravens coach Brian Billick chose to kick a field goal on 4th and goal from one foot from the end zone. The Dolphins went on to score a touchdown in overtime. Billick's explanation at his Monday press conference was, "Had we done that [gone for it] after what we had done to get down there and [not scored a touchdown], I can imagine what the critique would have been today about the play call." Billick, a nine-year veteran head coach and Super Bowl winner, was more concerned about criticism from Baltimore Sun columnists than the actual outcome of the game. He'd rather escape criticism than give his team the best chance to win.

Romer's paper considers data from 3 years of games. To avoid the complications of particular "end-game" scenarios with time expiring in the 2nd or 4th quarters, he considers only plays from the 1st quarter of games. So his recommendations should be considered a general baseline for the typical drive, and not a prescription for every situation.

Romer's bottom line is the graph below. The x-axis is field position, and the y-axis is the yards-to-go on 4th down. The solid line represents when it is advisable for a team to attempt the first down rather than kick. According to the analysis, it's almost always worth it to go for it with less than 4 yards to go. The recommendation peaks at 4th and 10 from an opponent's 33 yard-line.



Romer basically measures the expected value of the next score. Say it's 4th and 2 from the 35 yd line. He compares the value of attempting a field goal from the 35 with the point value of a 1st and 10 from the 33 (multiplied the probability of actually making the first down.) He also recognizes that a field goal isn't always worth 3 points, and a touchdown isn't always worth at least 6. The ensuing kickoff gives an expected point value to the opponent. There is a point value to having a 1st and 10 from one's own 25 yard line.

One weakness of the paper is that it dismisses the concept of risk as unimportant. Romer says that long-term point optimization should be the only goal, so coaches should always be risk neutral. But if the level of risk aversion were actually considered, we might find that coaches are more rational than he concludes.

But the paper makes a very strong case that coaches should go for it on 4th down far more often than they currently do. Job security for coaches seems to be the primary reason why they don't. At a meeting with some researchers making the case for more aggressive 4th down decision making, Bengals coach Marvin Lewis responded, "You guys might very well be right that we're calling something too conservative in that situation. But what you don’t understand is that if I make a call that's viewed to be controversial by the fans and by the owner, and I fail, I lose my job."

It would be great if a coach came along and rarely kicked. It would be gamble, but if Romer and others are right, chances are the coach would be successful. And the rest of the NFL would have to adapt. It might only take one brave coach.

"Expert" Predictions

Gregg Easterbrook of ESPN.com writes a yearly column poking fun at all the terrible predictions from the previous NFL season. Here is his latest--It's long but highly entertaining. Unfortunately, it also makes a pretty good case that people like me with complicated mathematical models for predicting games are wasting our time. And the "experts" out there are doing even worse.

Predictions are Usually Terrible

His best line is "Just before the season starts, every sports page and sports-news outlet offers season predictions -- and hopes you don't copy them down." Unfortunately for them, he does.

Easterbrook's examples of horrible predictions underscores the fact that pre-season NFL predictions are completely worthless. Before the 2007 season I made the same point by showing that guessing an 8-8 record for every team is just as or more accurate than the "best" pre-season expert predictions or even the Vegas consensus. (Pay no attention to my own predictions attempt last June before I realized how futile it is.)

Unlike Easterbrook, most of us don't write our predictions down. It's easy to forget how wrong we were and how overconfident we were. So many of us go on making bold predictions every year.

Proof I'm (Almost) Wasting My Time

The most interesting part of the column might be the "Isaacson-Tarbell Algorithm." It's a system suggested by two of Easterbrook's readers last summer for predicting individual games. Just pick the team with the better record, and if both teams have the same record, pick the home team. According to Easterbrook, the Isaacson-Tarbell system would have been correct 67% of the time, about the same as the consensus Vegas favorites. Although devilishly simple, it requires no fancy computer models or expert knowledge and it would have beaten almost every human "expert" with a newspaper column, tv show, or website.

(Actually, I'm going to give credit for inventing the algorithm to my then 6-year old son who is an avid football fan (wonder why?). He devised that very same system during the 2006 season in a contest with my regression model and his grandfather in a weekly pick 'em contest. I'm sure many young fans have followed the same principle over the years.)

The model I built was accurate about 71% of the time last year. Is the extra 4% accuracy (10 games) worth all the trouble? Probably not (for a sane person) but I'll keep doing it anyway. Actually, I think 4% is better than it sounds. Why? Well, a monkey could be 50% correct correct, and a monkey who understood home field advantage could be 57% correct. It's a matter of how far above 57% can a prediction system get?

And there are upsets. No system, human or computer-based, could predict 100% accurately. They can only identify the correct favorite. Sometimes the better team loses. From my own and others' research, it looks like the best model could only be right about 75-80% of the time. So the real challenge is now "how far above 57% and how close to 80% can a system get?" There's only 23 percentage points of range between zero predictive ability and perfect predictive ability. Within that range, 4% is quite significant.

Better Ways to Grade Predictions

Phil Birnbaum of the Sabremetric Research blog makes the point that experts should not be evaluated on straight-up predictions but on predictions against the spread. I'm not sure that's a good idea, and I think I have a better suggestion.

Phil's point is that there are very few games in which a true expert would have enough insight to correctly pick against the consensus. Therefore, there aren't enough games to distinguish the real experts from the pretenders. His solution is to always pick against the spread.

I don't agree. The actual final point difference of a game has as much to do with the random circumstances of "trash time" as with any true difference in team ability. A better alternative may be to have experts weight their confidence in each game as way to compare their true knowledge.

Consider a hypothetical example Phil Birnbaum cited about an .800 team facing a .300. The true .800 team vs. true .300 team match-up is actually fairly rare. As Phil has eloquently pointed out previously, the .800 team may just be a .600 team that's been a little lucky, and the .300 team could really be a .500 team that's been a little unlucky. There are many more "true" .500 and .600 teams than .300 and .800 teams, so this kind of match-up is far more common than you'd expect. And if the ".500" team has home field advantage, we're really talking about a near 50/50 match-up. Although the apparent "0.800" team may still be the true favorite, a good expert can recognize games like this and set his confidence levels appropriately.

Computer Models vs. "Experts"

Game predictions are especially difficult early in the season, before we really know which teams are good. Over the past 2 years of running a prediction model, I've noticed that math-based prediction models (that account for opponent strength) do better than expert predictions in about weeks 3-8. The math models are free of the pre-season bias about how good teams "should" be. Teams like the Ravens and Bears, which won 13 games in 2006, were favored in games by experts far more than their early performance in 2007 warranted. Unbiased computer models could see just how bad they really would turn out to be.

But later in the season, the human experts come around to realizing which teams are actually any good. The computer models and humans do about equally well at this point. Then when teams lose star players due to injury, the human experts can usually outdo the math models which have difficulty quantifying sudden discontinuities in performance.

And in the last couple weeks, when the best teams have sewn up playoff spots and rest their starters, or when the "prospect" 2nd string QB gets his chance to show what he can do for his 4-10 team, the human experts have a clear advantage. By the end of the season, the math models appear to do only slightly better than experts, but that's only really due to the particularities of NFL playoff seedings.

In Defense of Human Experts

Humans making predictions are often in contests with several others (like the ESPN experts). By picking the favorite in every game, you are guaranteed to come in first...over a several-year contest. But in a single-season contest, you'd be guaranteed to come in 2nd or 3rd to the guy that got a little lucky.

The best strategy is to selectively pick some upsets and hope to be that lucky guy. Plus, toward the end of the year, players that are several games behind are forced to aggressively pick more and more upsets hoping to catch up. Both of those factors have the effect of reducing the overall accuracy of the human experts. The comparison between math models and experts can often be unfair.

In Defense of Mathematical Predictions

Lastly, in defense of the computer models, the vast majority of them aren't done well and give them a bad name. There is an enormous amount of data available on NFL teams, and people tend to take the kitchen-sink approach to prediction models. I started out doing that myself. But if you can identify what part of team performance is repeatable skill and what is due to randomness particular to non-repeating circumstances, you can build a very accurate model. I'm learning as I go along, and my model is already beating just about everything else. So I'm confident it can be even better next season.

Fumbles, Penalties, and Home Field Advantage

I had a theory that part of home field advantage may come from fumble recovery rates. Specifically, I was thinking of the kind of fumble that results in a pile of humanity fighting for the ball by doing things to each other only elsewhere done in prisons. It seems that the officials often have no better way of determining possession than by guessing which player has more control of the ball than the other guy. Sometimes it seems like they have a system--pulling the players off the pile one by one until they can see the ball. But in the end, they're still relying on their own judgment. There are complicating factors. Where was the ball when the play was whistled dead? When was the original ball carrier down? Was it a fumble or incomplete pass? In many cases, the process is analogous to basketball referees determining possession of a "jump ball" by their judgment of which player has better grip, or which player ultimately ripped the ball loose.

Perhaps the influence of the crowd had an effect on the officials by biasing their judgment. It's plausible because their have been many academic studies documenting the psychological effect of a home crowd on officiating in several sports. Much of the research focuses on penalties and fouls called by the officials, but what about other matters of judgment? Fumble recoveries might shed some light.

If the fumble recovery rate of home teams is significantly greater than away teams, then we'd have evidence that NFL officials are favoring home teams. The table below lists home and visiting team's fumbles and fumbles lost from the entire 2007 regular season encompassing 256 games.










Fumbles Lost Rate (%)
Visitor 409 189 46.2
Home 388 189 48.7


It appears that although visiting teams fumbled slightly more often, they lost possession less frequently. Neither difference is statistically significant, however, indicating that officials are unbiased in that department.

Although my fumble theory was a bust, what about penalties. Could the difference in penalties given to home and away teams be large enough to explain most of the home field advantage in the NFL? But if visiting teams in fact penalized more, it wouldn't necessarily indicate officiating bias. It could be due to crowd noise or other factors.

The table below lists The visitor and home penalty and penalty yard averages for the 2006 regular season.








Penalties/G Pen Yards/G
Visitor 6.2 50.1
Home 5.8 48.1


I was very surprised by how small the difference is. On average, visiting teams only have 0.4 more penalties called (and accepted) on them than home teams for a difference of only 2 yards. I would expect the difference to be greater because of false start and delay of game penalties due to crowd noise.

In 2006, home teams won 55.6% of regular season games. According to the in-game model at Football Prediction Network, the difference of 2 penalty yards can only account for about 0.9% of the 5.6% home field advantage.

It appears that neither fumble recoveries nor penalties account for much of home field advantage in the NFL. Other factors such as travel fatigue or motivation are likely to be much more important. So I came up empty handed in the research...or so I thought until I came across some gems at Referee Chat Blog when doing some background research.

The author tracks officiating data from week to week, crew by crew. One of the most interesting things he's found is that crews don't tend to consistently favor home teams more than visiting teams across seasons (correlation = -0.04). Contrary to what was found in the study of officiating in British Premier League soccer I linked to above, NFL officials do not indicate a susceptibility to home crowd influence.

Many of the author's conclusions are based on differences in very small sample sizes (and he seems to realize this), but the data there are sound. Rex definitely knows his refs.