Luck: Epilogue

Coincidentally, as I was posting the results of my look at the amount of luck in NFL games, Phil Birnbaum posted this at his site. He was sharing a paper he did a while back about how "truly" good an MLB team is that wins 100 games. If some games are won on merit, and some by luck, then his calculations say the average 100 game winner won by luck 7 games more than they merited based on the their talent level. In other words, 100 game winners are probably both good and lucky.

But even more interesting was another tidbit Phil linked to. If you follow the references, you land here on Tom Tango's site. He's another accomplished sabermetrician. He approached the question of luck in sports outcomes far more elegantly than I did.

There, he works through his math calculating how many games are required for a sports league to produce the "true" best team on top of the standings. For MLB he says it's 69 games, and for the NFL it's 12 games.

Along the way, Tango articulates his theory. Regarding the distribution of win-loss records, the observed variance is: variance(observed) = varance(true) + variance (luck). Since we know the variance of the observed distribution (SD^2), and we know the variance of luck from the binomial distribution (p=0.5, n=16), we can solve for variance (true), which is the variance in team records based on merit.

I'm not sure what to think about his method. His theory assumes the "true" distribution (what I call pure-skill) is narrower than the observed distribution. Then luck acts on the true distribution to widen it.

But my simulations show that the distribution of a pure merit league is much wider than either the observed or the luck distributions. So I'm not sure how to interpret his theory.

On another note, I reran my simulation against the 96-01 NFL seasons. The scheduling system was a little different then, but the effect should be minimal. The simulation maximized its goodness-of-fit at 51% luck, which is pretty much what I found for the 02-06 seasons.

Upcoming posts include a look at YAC stats and how they affect QB stats, and a look at what really produces rushing TDs--something for the fantasy football fans out there.

  • Spread The Love
  • Digg This Post
  • Tweet This Post
  • Stumble This Post
  • Submit This Post To Delicious
  • Submit This Post To Reddit
  • Submit This Post To Mixx

6 Responses to “Luck: Epilogue”

  1. Derek says:

    I'm not sure how to interpret the stat about the number of games to produce the "true" best team. If it takes 12 games, that's still within the limits of a season. Does it just mean that the identity of the "best" team shifts more in other sports because it takes a smaller portion of the season?

  2. Phil Birnbaum says:

    Hi, Brian,

    The "true" distribution is always narrower than the observed distribution because luck creates more extremes. For instance, suppose no teams are of true 1-15 quality in the NFL, but some teams are 2-14. There is a pretty high probability that one of the 2-14 teams will get unlucky and go 1-15.

    A more extreme way to see it is with coins. Consider a head to be a win, and a tail to be a loss. Almost every coin ever made is 8-8 talent. But by luck, some coins wind up 2-14, or 4-12, or 14-6.

  3. Brian Burke says:


    Thanks for the comment. I enjoy your site and look forward to each By The Numbers.

    I think I understand the extra dispersion that luck adds to a distribution. But that understanding is in conflict with how I concepualized a "zero-luck/pure-skill" distribution.

    If you look at the histograms in my previous post, the distriubtion of the pure-skilll league (the better team always wins) is the widest. Injecting luck into the simulation actually results in a narrower distribution, at least to a point.

    Perhaps Tango's method assumes the distribution of wins(true) to be normal, which it apparently is not.

  4. Phil Birnbaum says:

    Hey, Brian,

    Thanks for the kind words here.

    If you have a league where the better team always wins, the talent distribution will be exactly the same as the observed distribution.

    For instance, suppose you have four teams in the league and a balanced schedule. A is better than B is better than C is better than D. In theory, A is 1.000. B is .667. C is .333, and D is .000. If you actually play a season, that's what you get. The two distributions are equal.

    That's the boundary case, when there's zero luck at all. If you start adding luck, the talent distribution will NARROW (D is no longer theoretically a .000 team, because it might occasionally beat C), but the distribution of observed results will spread out more.

    The other boundary condition, where it's *all* luck, is like the coin: the talent distribution is every team at .500, but the observed is very spread out.

    I'll read your posts again to clarify, but I think the difference is this: for a GIVEN amount of luck, the observed is always wider than (or equal to) the talent. But if the luck changes, that's not necessarily true. The observed from a model with X% luck is not necessarily wider than the talent from a model with Y% luck, if Y is smaller than X.

  5. Tarr says:


    I think a great way to improve the accuracy of your luck model would be to allow the relative quality of the teams to influence how likely luck is to play into the result. In other words: your current model suggests that the better team has roughly a 76% chance of winning, whether the game is between the 16th and 17th best teams, or the 1st and 32nd best teams. My suspicion is that the 16th vs. 17th game would be pretty darn close to a coin flip, while the #1 vs. #32 game would be a win for the #1 roughly 90% of the time.

    You could made the probability of winning be, say, 50% + (teamrank1 - teamrank2)*k%, where k is chosen to make the fit the best.

  6. Brian Burke says:

    Phil-thanks that's helps clear up my confusion.

    Tarr-I think that's a great idea. In a way, however, that's been done. Anyone with a prediction model basically estimates who the better team is and then sees how often the model is correct.

    But I think your suggestion would be very interesting. It's just a question of assigning reasonable probabilities to each pairing. (16 vs 17 would be 52/48, 1 vs 32 would be 95/5, etc.

Leave a Reply

Note: Only a member of this blog may post a comment.