NFL offenses and defenses are rated more for the success/failure of each drive as a whole, rather than play-by-play sequences. For example, if you get stuffed on 1st and 2nd down, but convert on 3rd down, that is better than gaining yards on 1st and 2nd but turning the ball over on 3rd. Even more than that, if you gain 4 first downs in a row, but fumble the ball away on the 5th set of downs, that is a failure—even though you were piling up "momentum"-positive sequences beforehand. <br /><br />Likewise, a long drive which results in a touchdown has a lot more value, obviously, than a drive of the same length which results in a field goal. I'm not clear on how this analysis accounts for that difference as well. (The Patriots, for example, have for years ceded a lot of yards, in relation to other teams, but have minimized the number of points resulting from those yards... I would think an analysis of a team like that might result in some misleading conclusions vis-a-vis momentum.)<br /><br />Maybe I'm misunderstanding the methodology—I'm not a statistician—but thought I'd throw that out there.

Surprised Billick would take that position.

The difference in the means is statistically significant but not practically significant.

Hi Brian,
Methinks you need to issue a cease and desist or plagiarism notice against your namesake!<br /><br />http://www.nfl.com/news/story/0ap2000000296566/article/december-momentum-just-isnt-everything-its-cracked-up-to-be<br /><br />Best,<br />Pete

I don't think your test for if the NFL is streaky is correct. Saying how many games are below the .05 p-value is insignificant, isn't the correct comparison comparing the two means and determining a p-value from the standard error? If the difference in the means is really small, but you have a lot of data you can accept that the difference in the means is statistically significant.

Nathan: D'oh, you're absolutely right. The result's highly statistically significant (although the effect size is small). It's still interesting that the number of games that met the p=.05 significance threshold was actually less than predicted by the null hypothesis (unless I'm making another stupid mistake).

Anon, that's the standard error for a single game. The standard error for the sample should be s/sqrt1=8 s=8 8/sqrt3875=.1285, which is far above 95%. Of course, I had the same thought as others, that when the Seahawks demolished the Cardinals 58-0 they didn't outplay them in every phase of the game for 60 minutes because of momentum, it was because they are were a far superior team. That'd be tough to control for, but you could use pregame GWP and leave out week 1 games.

does this analysis look at play by play differences in "success"? that is far too small a window. <br /><br />So every incomplete pass would break a "streak" of success for a offense with momentum. <br /><br />Also, it seems that success could be long streaks, but failure could only have a maximum of 3 in a row, then it's a punt. Great stuff. I'm not sure I agree with the conclusion that this data shows evidence of streakiness, though. Doing a one sample t-test on the data provided (mean=64.7, SEM=8.0, expected mean=67, N=3875), I get a p-value of .77 and a 95% CI of (-18.0, 13.4) for the difference. Unless I'm missing something, this would fail to falsify the null hypothesis of no streakiness.<br /><br />As for the fact that 1.2% of games were streaky at a p=.05 level, doesn't the null hypothesis predict that up to 5% of games will meet this significance threshold? If anything this result might indicate the plays are more "unstreaky" than predicted by random chance--in fact doing a chi-square test seems to suggest just that. Using an observed frequency of 49/3875 and an expected frequency of 193.75/3875 (5%), I get Chi-squared=113.8 with 1 df, which is significant at the p<.0001 level.

It seems like you might get a little bit of streakiness in the data overall just from teams piling up 'unsuccessful' plays at the ends of halves or when they're in run-out-the-clock mode. Did you try to remove those plays, or plan to check using WP instead of EP?

Do this take into account games when fans gets arrested for streaking?

Great analysis and very interesting. Something tells me you would expect to see more streakiness/fewer runs than you would expect from an independent model because NFL are not independent.<br /><br />There are all sorts of adjustments, both tactically and personnel, that are made that move the chance of the next play being a success or not all over the place. It might not be a case of 'success begets more success' just that I believe there would be a lag of a few plays at least for a coach to make whatever changes are made to bring the balance back towards him.<br /><br />At a guess, the function denoting the probability of success on a given play should be able to be modelled as a stochastic process. I may have a look into this.

I like this more general approach to testing for momentum.<br /><br />It does seem like the expected success rate will vary depending on how risky the situation is. Compare, for example, the expected success rates for going for it on fourth and goal from the 6, or kicking the field goal on fourth and goal from the 15.<br /><br />If the expected success rate has lots of runs, then that could easily lead to runs in the successes without any team performance carry on effect.

Ah, very good. Thanks for the reply.

Brian A - I apologize. I updated the post earlier this morning to address exactly that point. I'm betting you got a hold of the early draft. In short, I agree.

I'm thinking some of the "streakiness" observed here can be attributed to differences in team strength.<br /><br />Take an extreme example where both teams have the best offense you can imagine and, at the same time, the worst defense you can imagine, so that every play results in a success for the offense. In that case, each possession is a run that lasts for as many plays as the drive, and you would observe a "streaky" game in this method, even though the cause is completely explainable by differences in team strength. I would think you would want to try to control for something like this in the analysis.<br /><br />Keep up the good work!