Comments on Advanced Football Analytics (formerly Advanced NFL Stats): Entropy, Let's Make a Deal, and the Verducci Effect

I think you can't talk about entropy here ; in...

2010-04-09T14:16:40.846-04:00

I think you can't talk about entropy here ; in fact entropy is not matters and energy in the universe goin' into deterioration.

The 2nd law of thermodynamics sate that the entropy of the universe cannot decrease ; meaning that the amount of energy is always the same, and if a system loose energy (like two hygrogen atoms and one oxygen atom bonds to create a more stable, organize therefore less energetic, water molecule) this energy goes on to increase the outside energy : the entropy of the universe increase or stay the same, yet on each case the energy is equal.

Entropy is a way to linked the organisation state of matter with energy and therefore predicting the evolution of the universe or ice on a beach in LA. The more entropy is created, the more a system elsewere is evolving towards less entropy, ie a less energetic state, ie giving energy to the universe.

of course what become complicated is that a system doesn't always evolve to the most stable state, as certains states are not spontanious, needs activation energy, deepends of the environement, ectect for exemple molecules are more stable, ie energetic state favorable just because the atoms where at one point so close that at this moment they are more stable.

So I think it is more relevant saying that entropy nerver decrease globally.

Please see the follow up article and its comments ...

2009-08-03T12:35:44.991-04:00

Please see the follow up article and its comments for further discussion. I now am confident that the paradox/illusion I describe is real. However, I am no longer confident in any way that it applies specifically to the Verducci analysis. It would depend on the particular methodology of the analysis.

Follow up

Yeah, that was immature. Will doesn't understa...

2009-08-02T13:26:16.993-04:00

Yeah, that was immature. Will doesn't understand my point, and it's natural for him to be defensive.

His Bob Barker analogy makes my point. The story he tells is about someone trying to make what is effectively a future prediction. The very problem with the Verducci analysis is that it is looking backward already knowing many of their wrong predictions. Their "future" predictions within the past data would therefore be more accurate than a random guess.

I seriously spit out my drink when Will Carroll ac...

2009-08-02T13:13:08.693-04:00

I seriously spit out my drink when Will Carroll accused Brian of copying Leonard Mlodinow's pop statistics book. Getting the name wrong was the icing on the cake.

I think in much the same way that it is said old b...

2009-08-02T01:42:03.993-04:00

I think in much the same way that it is said old boxers have one great fight left in them, it may be the case most good pitchers can have a high workload high quality season.

Consider the following:
20% of pitchers can have multiple back to back to back solid high performing seasons
80% of pitchers don't have the durability and quality to produce those back to backs, and can have one good season, but it's all their body can take

If you assume the 20% have 5 good seasons, and the 80% just the 1, then of all good seasons, 1/2 come from the "durable" pitchers, and 1/2 from the "1 hit wonder" pitchers.

Thus, of good pitched seasons, there is nearly a 50/50 shot of another. However, of a good season by a given pitcher in his first year, there is an 80% chance the next year is a dud.

I think it is this discrepancy in the actual sustainable high end pitchers vs 1 yr pitchers that distorts the numbers, to picky back on Vince.

Here's my guess at the explanation. Some pitc...

2009-08-01T15:27:53.712-04:00

Here's my guess at the explanation. Some pitchers are less durable (more injury prone) than others. The leaguewide injury rate stays relatively low because most pitchers eventually settle into roles that they can handle without getting injured too often - which means that high workload pitchers tend to be among the more durable pitchers. But pitchers who are just getting their first chance or two at a high workload are not especially durable, because the more injury prone pitchers haven't been weeded out yet, so their injury rates will be higher than those of the established high workload pitchers. So, consistent with the Verducci effect, pitchers in their second year with a large workload will get injured more than the typical high workload pitcher.

Another way to put it is that it is in terms of regression to the mean for healthiness. Consider the pitchers who are given a chance at a high workload in one season. Those who are coming off of a very healthy year are likely to be somewhat less healthy this year (because of regression). If the pitcher has already had several healthy high workload years, then we have a lot of evidence that the pitcher is skilled at remaining healthy, so we wouldn't expect that much regression. But if the pitcher is coming off of his first healthy high workload year, then we don't have much evidence of his skill at remaining healthy so we'd expect more regression (higher injury rates).

Okay, I just want to spell this out again. Let...

2009-08-01T12:35:06.012-04:00

Okay, I just want to spell this out again. Let's say that leaguewide, one in three seasons is an 'injury season' for a pitcher.

So three seasons: 2008, 2009, 2010. Each of these has a 33.3% chance of being an injury season.

If this was a Monty Hall problem, we'd see that 2008 was a great season, and then guess 2009 would be an injury season. 33% chance that's true, 66% chance that the injury season is 2008 or 2010. Monty would give us that stat sheet for 2008, showing a great season, and we'd mathematically want to switch to 2010, since there'd be a 66% chance that would be the injury season, compared to 33% that it would be 2009.

Then the season would end, and let's say there's no injury. Montywise, there's now a 100% chance that 2010 is an injury season, so we'd've won our bet before the games are even played. That obviously doesn't make sense, so we toss it out.

I think the key factors are that (a) due to age and physical stress, almost every player in every sport is more likely to get injured in year N+1 than year N, and (b) due to regression to the mean, any season that is unusually healthy is likely to be followed by a year that's less healthy than it (but not less healthy than average). If the average player misses 10 games a season, and your guy misses 2 in 2008, he'll probably miss more than 2 in 2009, but is no more likely to miss more than 10.

Now that Will Carroll has posted on his effect, le...

2009-08-01T11:47:47.010-04:00

Now that Will Carroll has posted on his effect, let me make it clear I'm a different Will, and not an injury guru.

I don't think the analogy is completely worthless, because both situations are examples of intuition leading to the wrong answer, which is a common thing - our ape ancestors conquered the world in part because of their ability to find patterns in data, but this also gives us a knack for finding patterns that aren't there.

I see the Monty Hall problem as a failure of data assimilation, while the Verducci Effect just seems like selection bias and/or regression to the mean. But here's another idea: what if pitchers tend to be called up to the majors when healthy, so they tend to get injured as time passes? This would be another form of selection bias, because we start the analysis when they enter the majors, which is a healthy point on their random walk through relative wellness.

I'm sure that pitchers tend to get more injured with work, because every pitch is a chance for an injury. I'd even grant that the chance of injury on a given pitch may increase with increased pitches, if I saw some data on that. The Verducci Effect may capture these factors, but it also captures biases, like the rule of 370. One reason we don't see the problems is that most players follow the same basic career arc, so we don't have a control group to compare with. We have precious few Ricky Williamses to help us tease apart the effects of age from overwork, etc.

I believe many of the anecdotes about the pitchers...

2009-08-01T11:23:46.558-04:00

I believe many of the anecdotes about the pitchers are much worse than they just had a down year the year after their big year. Mark Prior and Kerry Wood, for instance, had their big year (2003), and then were both completely broken for the rest of their lives. (see http://en.wikipedia.org/wiki/Mark_Prior).

Also, separately, pitching is well-known to be really damaging to your arm in general. Pitchers have to ice their arm afterwards *to stop the bleeding.* It seems highly plausible that pitching ridiculously much the year after not pitching that much would murder your arm.

The Monte Hall problem isn't meant to be a per...

2009-07-31T22:04:28.067-04:00

The Monte Hall problem isn't meant to be a perfect analogy, rather an example of an illusion created by selectively narrowing the data set after the fact.

Will-I'd like to know what you think of the points I made in my comment above. I think those are fairly strong points, game shows aside.

Plus, the simple simple fact that players tend to deterioriate instead of get healthier from one year to the next may be enough to explain much if not all of what you're seeing.

I don't understand the reference to Mlodinow. The Monte Hall problem has been a acommon exercise in undergrad probability classes for a generation. That someone wrote about it in a book does not make it his. I'm sure it's in lots of books.

This isn't even remotely close. Your Monty Hal...

2009-07-31T21:07:28.491-04:00

This isn't even remotely close. Your Monty Hall problem is certainly a complete copy from Leonard Mlodinoff's book, then you misapply it as pointed out above. Not only are the three years independent, every data point is independent. Yes, large numbers will average out, but that doesn't tell us anything about any individual pitcher.

I'll use a Bob Barker analogy: There's a game on Price Is Right where there's various amounts behind a punch board. There are varying amounts of money but only one big prize. What if we made this even simpler. There's a prize in 50 of 100 holes. The trick is you have to hit all fifty to win. Knowing that one hole has a prize doesn't gain you any information. The odds are only changing slightly overall, but each individual chance is 50%. The odds of hitting it perfectly is astronomical. There's no additional information gain.

Why the Verducci Effect works is that it uses large numbers to predict something that can be brought down to individual players. 100% accurate? No. Good general guideline? Yes.

There are a number of ways to argue that the True ...

2009-07-31T17:40:55.703-04:00

There are a number of ways to argue that the True Injury Rate is higher than the accepted version (a variety of ways in which healthy players might be counted more often than injured players). If that's the case, then the Verducci Effect relies on multiple types of selection bias.

For both pitches thrown and RB carries, it will be interesting to find out as more and more info comes in whether or not the likelihood of injury rises with each subsequent pitch/carry or stays the same. It seems obvious that if you throw 200 innings you are more likely to be injured than if you throw 100 innings even if the injury rate per pitch actually dropped from inning 100 to inning 200.

Here's an alternative explanation that might b...

2009-07-31T15:43:28.896-04:00

Here's an alternative explanation that might be clearer. Say there is a league-wide MLB pitcher injury rate. Assume it's 1 injury per 3 years. Verducci and Carroll observe years after high/increased workloads with injury rates significantly higher than .33. OK, fine.

Now compute the MLB-wide injury rate for pitchers not including each pitcher's high workload and therefore probably healthiest year, and you'll get something significantly higher than 0.33. It might be 0.50 or whatever.

That's what Verducci/Carroll are seeing. They see the 0.50 injury rate in the year after a high workload year, compare that to the baseline 0.33, and infer causation.

So no, I'm not a victim of the gambler's fallacy. The gambler's fallacy is prospective. The fallacy in Verducci's analysis is retrospective selection bias.

The independent coin flip analogy is tempting, but it doesn't quite hold. Because of the entropy effect, the initial coin flips will all tend to be "heads" (or non-injury) more than subsequent flips. The coin flips are quasi-dependent in the following way. Because of entropy--the reality that pitchers start healthier and progressively deteriorate--the coin flips will increasingly favor the injury (tails). So the first flip might be 60/40 heads/tails, and the second flip 50/50, and the third flip 40/60. The outcome of subsequent flips depends on their sequence. This creates the illusion of dependence and causation, which is part of what Verducci and Carroll observe.

As I honestly admitted, I am not 100% on this, so please save the rude backhanded comments for some other site. I'm keeping an open mind, which is quite a strange experience for me!

Analysis FAIL. Deeply disappointed to see an enti...

2009-07-31T11:32:56.187-04:00

Analysis FAIL.

Deeply disappointed to see an entire article devoted to affirming the gambler's fallacy on such a respected website. Billingham and Dan R have it exactly right: you're taking independent tests and declaring them dependent.

Billingham nailed it

2009-07-31T11:10:01.645-04:00

Billingham nailed it

Your analysis is accurate if the pitcher is guaran...

2009-07-31T11:07:14.970-04:00

Your analysis is accurate if the pitcher is guaranteed to be injuried in one of the three years. Are you saying that after two injury free years there must be an injury in year three? If that's the case then yes after an injury free year the odds for the next two years go up to 50%.

Injuries aren't guaranteed. There likely is a bit of regression to the mean when considering pitchers who had a heavy workload in year n as it's not an unbiased sample. The hard question is how much of the increased rate (over 0%)is regression to the mean and how much is caused by the heavy workload.

I'm not sure this effect really exists, though...

2009-07-31T10:58:11.842-04:00

I'm not sure this effect really exists, though I loves me a good Monty Hall problem.

The key to the Monty Hall problem is that the doors are all dependent events. There'a 100% chance of prize being behind one door only. If there's a 1/3 chance of any given season being an injury season, that doesn't mean that one in every set of three seasons will be an injury season, it just means that on average, one in three seasons will be an injury season.

It's more analogous here to a coin that's been flipped 5 times and been heads each time: you intuitively think the next one MUST be tails, but really, it's still 50:50.

Brian, I don't know if the analogy is quite r...

2009-07-31T10:51:48.804-04:00

Brian,

I don't know if the analogy is quite right - I think the Verducci Effect is probably simple regression to the mean, like the "rule of 370" with running backs. A long, healthy, and effective pitching season is an outlier, so if you look at the next season you will find it to be more average, or in other words, shorter, less healthy and less productive.

By the way, you may already know this, but the Monty Hall problem is one of the funniest examples of public "innumeracy": every time it comes up in a major newspaper article it generates a flood of angry feedback from nonbelievers, including a fair number of mathematicians. I don't know if the readers here will respond the same way, but be prepared ;)