Momentum Part 5 - Series Level Analysis

This is the final part of my series on momentum in a football game. Is momentum a causative property that a team can gain or lose, or is it only something our minds project to explain streaks of outcomes that don't alternate as much as we expect? It's been a couple months since I began this series, so as a refresher, here is what I've looked at so far:

Part 1 examined the possibility that momentum exists by measuring whether teams that obtain the ball in momentum-swinging ways go on to score more frequently than teams that obtained the ball by regular means.

Part 2 looked at whether teams that gained possession following momentous plays went on to win more often than we would otherwise expect.

Part 3 focused on drive success following a turnover on downs, which is often cited by coaches and analysts as a reason not to go by the numbers when making strategic decisions.

Part 4 applied a different method of examining momentum by using the runs test so see the degree to which team performance is streakier than random, independent trials.

In this part, I'll apply the runs test at the series level, to see if teams convert first downs (or fail to convert them) more consecutively than random independence would suggest. But first, I'll tie up some loose ends left hanging from part 4. Specifically, I'll redo the play-level runs test to eliminate potential confusion caused by a team with disparate performance from their offensive and defensive squads.

Recall that the runs test indicates how streaky a string of results is compared to what would be expected by chance. (By the way, just in case anyone is confused, "runs" refers to "streaks" of an outcome, not to running vs. passing.)

The average expected number of runs of the same result ('mu' in the equation above) is calculated based on the number of 'successful' trials (N+) and the number of 'unsuccessful' trials (N-). For our purposes, a trial could be a football play, a series of downs, drives in a game or season of games. One of the best features of the runs test is that it accounts for the proportion of success. In football terms that means it accounts for differences in team strength and the fact that better teams are more likely to have runs of success.

Part 4 of this series looked at the play level, classifying the success of football plays as successful (S) or unsuccessful (U) based on whether the play improved or worsened a team's net scoring potential, as measured by Expected Points. (This is the basis of the ANS version of the Success Rate statistic.) If games tended to produce fewer strings of consecutive success than the runs test tells us to expect, that suggests there may be an element of momentum in the game.

The results of the previous analysis showed that we should expect 67.0 runs in a game. But we actually observed 64.7 runs, which indicates there is slightly more streakiness than if all the plays were purely independent. The problem with my original analysis was that it treated all plays by both a team's offense and defense together, which could overstate the amount of streakiness when a team's offense is significantly better than its defense or vice versa.

For example, imagine a team with the perfectly good offense and a perfectly bad defense. Its total success rate would be 50%, suggesting a relatively un-streaky performance. But in actuality, we would see very long runs of success and non-success, and a new run would be created only when there was a change in possession. Even without any momentum present, we would only observe about 20 runs when we should expect about 80 in this extreme example.

So I reran the analysis, separating offensive and defensive performance. At the play-level:

-Home offenses should expect an average of 32.4 runs, and we observed 31.4.
-Home defenses should observe an average of 33.0 runs, and we observed 31.9.

This is consistent with the results of the initial analysis. We see a very slight momentum effect to the tune of about 2 fewer total runs in a game. But it often only takes 1 flip from success to non-success to create 2 additional runs. (SSS = 1 run, SUS = 3 runs.)

A shortcoming of any play-level analysis is that it ignores the format of the game. A team can alternate successful and unsuccessful plays all the way to the end zone if the minimum magnitudes of its successes are high enough. So a series-level analysis might be more enlightening. I think the series level is the better way to look at things, and it's probably how people that perceive momentum do experience it--in terms of moving the chains. At the series-level:

-Home offenses should expect an average of 13.7 runs, and we observe 12.5.
-Home defenses should expect an average of 13.7 runs, and we observe 12.7.

We can detect a slightly stronger momentum effect in relative terms at the series level as compared to the play level for an entire game. There appears to be about 1 to 2 series worth of streakiness beyond what would be expected if the game were purely comprised of independent trials. (Remember that in only takes one "flip" from a success to a non-success, or vice versa, to create two additional runs: HHH=1 run, HTH=3 runs)

There's some streakiness there in the data, but it's a far cry from what the believers in momentum have in mind. I believe this degree of streakiness is imperceptibly small to even the most experienced observer. If two gamblers in the old west bet on a flipped a coin and they saw HHTTHTHHHT (6 runs) instead of THTHHTTTHT (7 runs), one wouldn't suddenly draw his six-shooter accusing the other of cheating with a trick coin.

The momentum effect we observed with the runs test might be explained by natural phenomenon unrelated to the common notions of momentum. Key mid-game injuries could cause an otherwise unexpected run of non-success. A fourth-down conversion failure is counted the same as a punt or FG attempt, but it also gives a team an additional bite at the success apple, prolonging a run further than expected. "Trash time" might be the biggest factor, where there is an unexpected run of outcomes inconsistent with the rhythm of rest of the game. In retrospect, I think it would be quite shocking to find that football plays were purely independent trials.

This series of articles examined momentum in a number of ways--using different definitions of momentum and different methods of analysis. It looked at momentum at the game level, the drive level, the series level, and the play level. Although it can't be ruled out that there is some grain of truth to the role of momentum, the effect sizes we observed are probably too small to be noticed by a fan or even by a player or coach.

Notes: The data set includes all games from 1999 through the 2013 conference championships. A series success is defined as any conversion for a first down or a touchdown.

7 Responses to “Momentum Part 5 - Series Level Analysis”

1. Nate says:

Is the expected success rate actually 50%? In extreme situations - like attempts to convert 4th and goal from the 6, it seems the success rate would be very low. Similarly, a 4th and 1 conversion attempt on your own side of the field probably has a high success rate.

Sure, most of the time, the success rate will be close to 0.5, but it probably doesn't take much to throw off the streak test.

2. Brian Burke says:

Nate-It depends, obviously. The point is that a 4th down conversion attempt gives a team an additional chance to bridge together multiple "runs" into one "run" if successful. In other words, a successful conversion often turns what would otherwise be 3 or 2 runs into 1 run. I'm not sure if that's an issue or not, but I thought it was worth mentioning.

By the way, if anyone is curious, the overall series success rate is around 67% ~ 70%.

3. Anonymous says:

Have to say I've enjoyed reading your series on momentum, probably because it's one of those 'myths' in sport that I just refuse to believe in. I am firmly of the opinion that any evident momentum within the course of a game is a construct of the human mind, trying desperately to cope with the randomness of events and seeking some order in them.
The very fact that such huge 'momentum swings' are possible points immediately to the transient and arbitrary nature of the concept.
People being people try and make sense of it with a concept such as momentum because we just can't cope with there not being a reason (or at least there not being one better than random streaks of results).
Would we do this with coins? Would someone witnessing 7 heads in a row say that heads have 'momentum'? It's unlikely because we essentially know that a coin toss is random. Sports are different, they're supposed to be decided by skill not chance, and so when we witness randomness within them we need some other explanation that might be something to do with the players.
Side-tracking slightly, I recently read that 50% of soccer matches are decided by luck rather than skill. Whether this is a reason to despair and wonder why your team bothers acquiring talent when it's half luck anyway, or to celebrate that no matter how bad your team they still stand a chance, is open to debate. I only mention it to highlight the large element that luck plays in some sports leading to mirages like momentum. After all, if it was all down to skill the better team would simply continue to roll to inevitable victory in one constant stream of momentum.

4. Nate says:

> Nate-It depends, obviously. ...

Sorry, I guess I could have been clearer.

Let's say -for the sake of discussion- that, on average, the series success rate is 70%, but that there's a 75% success rate outside and then a 50% success rate inside the red zone. Because advancing the ball into the red zone cuts success rates, that means that the trials are not independent, even without momentum as an explanatory factor.

5. Dan says:

I'd expect the end of games to look more streaky, because of changes in strategy (prevent defense, running out the clock, etc.). What do the numbers look like if you only use data from the first 3 quarters?

Also, I don't have a clear intuitive sense of what these streakiness numbers mean (e.g., 12.5 runs instead of 13.7). It would be easier to interpret if we could run some simulations of teams with known levels of streakiness (by a more intuitive measure) and see how they came out on that measure.

For example, imagine a team that had a 75% series success rate when they were "hot" and a 60% series success rate when they were "cold", and which was cold for the first half of each game and hot for the second half. How many runs would they have, compared to a team with a constant success rate? What would their hot & cold success rates need to be (instead of 75% and 60%) in order to match the observed number of runs?

6. Brian Burke says:

Nate-That's an excellent point. But I think the runs test methodology will account for that. One overall success rate is not used in the analysis. Each particular game's successes/non-successes create the expected # of runs, so it will account for the number of visits to the red zone and other similar factors.

7. Nate says:

> How many runs would they have, compared to a team with a constant
> success rate? ...

The runs test can really only be used to reject the assumption of statistical independence. (There might be something happening that hides the impact of momentum in the test.)

> For example, imagine a team that had a 75% series success rate when
> they were "hot" and a 60% series success rate when they were "cold", ...

Let's say teams average 14 first downs per half. Then we'd expect the streaks in the hot half to be:
2*( 14*(.75) * 14 * (.25) ) / 14 +1 = 6.25
and in the cold half
2*( 14*(.6) * 14 * (.4) ) / 14 +1 = 7.72
and if the team were average (67.5%) over a whole game:
2*(28*(.675)*28*(.325))/28+1= 13.285

There's a (.75*.4)+(.25*6)=.45 chance of an extra switch when we put the hot and cold halves together, so the expected hot/cold total is:
6.25+7.72+.45=14.42

We'd expect the hot/cold team to have about 1.2 more streaks per game. Of course, with this small number of trials, the difference is of limited statistical significance.

> That's an excellent point. But I think the runs test methodology will
> account for that. One overall success rate is not used in the analysis.

I don't see how there could be a statistical difference between the green zone and 'running hot' or the red zone as 'running cold'.