Let’s Make a Deal was a 1970s game show, entropy is the second law of thermodynamics, and the Verducci Effect is an injury phenomenon named for a Sports Illustrated reporter. What do they have in common?

In Let's Make a Deal, host Monty Hall would walk the costumed audience, picking contestants on the spot to play various challenges for prizes. The central challenge was a simple game where the contestant had to choose one of three doors. Behind one of the doors was a big prize, such as a brand new Plymouth sedan. But behind the two other doors were gag prizes, such as a donkey.

Sounds simple, right? The contestant starts with 1 in 3 chance of picking the correct door. But then Monty would open one of the doors (but never the one with the real prize) and with two closed doors remaining, ask the contestant if she wanted to switch her choice. She would waffle as the audience screamed “switch!...stay!...switch.”

The answer is intuitively obvious. It doesn’t matter. She has a 1 in 3 chance when she first picked the door, and we already know one of the other two doors doesn’t have the real prize. So whether she switches or not is irrelevant. It’s still 1 in 3.

...And that would be completely wrong.

The real answer is she should always switch. If she stays, she has a 1 in 3 chance of winning, but if she switches she has a 2 in 3 chance of winning. I know, I know. This doesn’t make any sense.

Don’t fight it. It’s true. If the contestant originally picks a gag door, which will happen 2 out of 3 times, Monty has to open the only remaining gag door. In this case, switching always wins. And because this is the case 2/3 of the time, always switching wins 2/3 of the time.

(If you don’t believe me, visit this site featuring a simulation of the game. It will tally how many times you win by switching and staying. It’s the only thing that ultimately convinced me. But don’t forget to come back and find out what this has to do with the Verducci Effect.)

Baseball Prospectus defines the Verducci Effect as the phenomenon where young pitchers who have a large increase in workload compared to a previous year tend to get injured or have a decline in subsequent year performance. The concept was first noted by reporter Tom Verducci and further developed by injury guru Will Carroll.

But I'm not sure there really is an effect. First, consider why a young pitcher would have a large increase in workload. He’s probably pitching very well, and by definition he’s certainly healthy all year. Bad or injured pitchers don’t often pitch large numbers of innings.

Now, consider a 3-year span of any pitcher’s career. He’s going to have an up year, a down year, and a year in between. Pitchers also get injured fairly often. There’s a good chance he’ll suffer an injury at some point in that span.

Injuries in sports are like entropy, the inevitable reality that all matter and energy in the Universe are trending toward deterioration. Players always start out healthy and then are progressively more likely get injured. Pitchers don’t enter the Major Leagues hurt and gradually get healthier throughout their career. It just doesn’t work that way. Injuries always tend to be more probable in a subsequent year than any prior year. The second year in a 3-year span will have a greater chance of injury than the first, and the third would have a greater chance than the second.

Back to Let’s Make a Deal. Think of that three year span as the three doors. Without a Verducci Effect, the years would each have an equal chance at being an injury year. For the sake of analogy, say it’s a 1 in 3 chance. Now Monty opens one of the doors and shows you a non-injury year. The remaining doors have a significantly increased chance of being identified as an injury year. In this case, it’s a 1 in 2 chance.

I think that’s essentially what Verducci and Carroll did in their analysis. We already know a high workload season can’t be an injury season, therefore subsequent years will retrospectively appear to have higher injury rates. We would normally expect to see injuries in 1 out of 3 years, but we would actually see them 1 out of 2. It’s an illusion.

The analogy isn't perfect. Door one is always the open door without the prize, and there's no switching. Also, unlike a single prize behind one door, injuries can be somewhat independent (or more properly described in probability theory as "with replacement"). That is, a pitcher could be injured in more than just one year. But the Verducci Effect only considers two-year spans, and since one year is always a non-injury year, the analogy holds in this respect.

Ultimately, just like in Monty Hall’s game, the underlying probabilities don’t change at all. Only the chance of finding what we're seeking changes. There was always a 1 in 3 chance that one particular door would contain the prize. That never changes throughout the course of the game. But after identifying a non-prize door, we’ve increased our chances of finding the injury…err…I mean Plymouth.

I hereby name this phenomenon the Monty Hall Effect.

(PS Quite frankly, I’m not entirely confident in this. It’s hard to wrap my head around, and I keep second-guessing my logic. If someone out there, like a quantum physicist maybe, understands this stuff well, please add your two cents.)

Edit: See my comment for an alternate explanation of how the Verducci Effect may be an illusion.

## Entropy, Let's Make a Deal, and the Verducci Effect

By
Brian Burke

published on 7/31/2009
in
fallacies,
other sports,
research

Subscribe to:
Post Comments (Atom)

Brian,

I don't know if the analogy is quite right - I think the Verducci Effect is probably simple regression to the mean, like the "rule of 370" with running backs. A long, healthy, and effective pitching season is an outlier, so if you look at the next season you will find it to be more average, or in other words, shorter, less healthy and less productive.

By the way, you may already know this, but the Monty Hall problem is one of the funniest examples of public "innumeracy": every time it comes up in a major newspaper article it generates a flood of angry feedback from nonbelievers, including a fair number of mathematicians. I don't know if the readers here will respond the same way, but be prepared ;)

I'm not sure this effect really exists, though I loves me a good Monty Hall problem.

The key to the Monty Hall problem is that the doors are all dependent events. There'a 100% chance of prize being behind one door only. If there's a 1/3 chance of any given season being an injury season, that doesn't mean that one in every set of three seasons will be an injury season, it just means that on average, one in three seasons will be an injury season.

It's more analogous here to a coin that's been flipped 5 times and been heads each time: you intuitively think the next one MUST be tails, but really, it's still 50:50.

Your analysis is accurate if the pitcher is guaranteed to be injuried in one of the three years. Are you saying that after two injury free years there must be an injury in year three? If that's the case then yes after an injury free year the odds for the next two years go up to 50%.

Injuries aren't guaranteed. There likely is a bit of regression to the mean when considering pitchers who had a heavy workload in year n as it's not an unbiased sample. The hard question is how much of the increased rate (over 0%)is regression to the mean and how much is caused by the heavy workload.

Billingham nailed it

Analysis FAIL.

Deeply disappointed to see an entire article devoted to affirming the gambler's fallacy on such a respected website. Billingham and Dan R have it exactly right: you're taking independent tests and declaring them dependent.

Here's an alternative explanation that might be clearer. Say there is a league-wide MLB pitcher injury rate. Assume it's 1 injury per 3 years. Verducci and Carroll observe years after high/increased workloads with injury rates significantly higher than .33. OK, fine.

Now compute the MLB-wide injury rate for pitchers

not including each pitcher's high workload and therefore probably healthiest year, and you'll get something significantly higher than 0.33. It might be 0.50 or whatever.That's what Verducci/Carroll are seeing. They see the 0.50 injury rate in the year after a high workload year, compare that to the baseline 0.33, and infer causation.

So no, I'm not a victim of the gambler's fallacy. The gambler's fallacy is prospective. The fallacy in Verducci's analysis is retrospective selection bias.

The independent coin flip analogy is tempting, but it doesn't quite hold. Because of the entropy effect, the initial coin flips will all tend to be "heads" (or non-injury) more than subsequent flips. The coin flips are quasi-dependent in the following way. Because of entropy--the reality that pitchers start healthier and progressively deteriorate--the coin flips will increasingly favor the injury (tails). So the first flip might be 60/40 heads/tails, and the second flip 50/50, and the third flip 40/60. The outcome of subsequent flips depends on their sequence. This creates the illusion of dependence and causation, which is part of what Verducci and Carroll observe.

As I honestly admitted, I am not 100% on this, so please save the rude backhanded comments for some other site. I'm keeping an open mind, which is quite a strange experience for me!

There are a number of ways to argue that the True Injury Rate is higher than the accepted version (a variety of ways in which healthy players might be counted more often than injured players). If that's the case, then the Verducci Effect relies on multiple types of selection bias.

For both pitches thrown and RB carries, it will be interesting to find out as more and more info comes in whether or not the likelihood of injury rises with each subsequent pitch/carry or stays the same. It seems obvious that if you throw 200 innings you are more likely to be injured than if you throw 100 innings even if the injury rate per pitch actually dropped from inning 100 to inning 200.

This isn't even remotely close. Your Monty Hall problem is certainly a complete copy from Leonard Mlodinoff's book, then you misapply it as pointed out above. Not only are the three years independent, every data point is independent. Yes, large numbers will average out, but that doesn't tell us anything about any individual pitcher.

I'll use a Bob Barker analogy: There's a game on Price Is Right where there's various amounts behind a punch board. There are varying amounts of money but only one big prize. What if we made this even simpler. There's a prize in 50 of 100 holes. The trick is you have to hit all fifty to win. Knowing that one hole has a prize doesn't gain you any information. The odds are only changing slightly overall, but each individual chance is 50%. The odds of hitting it perfectly is astronomical. There's no additional information gain.

Why the Verducci Effect works is that it uses large numbers to predict something that can be brought down to individual players. 100% accurate? No. Good general guideline? Yes.

The Monte Hall problem isn't meant to be a perfect analogy, rather an example of an illusion created by selectively narrowing the data set after the fact.

Will-I'd like to know what you think of the points I made in my comment above. I think those are fairly strong points, game shows aside.

Plus, the simple simple fact that players tend to deterioriate instead of get healthier from one year to the next may be enough to explain much if not all of what you're seeing.

I don't understand the reference to Mlodinow. The Monte Hall problem has been a acommon exercise in undergrad probability classes for a generation. That someone wrote about it in a book does not make it his. I'm sure it's in lots of books.

I believe many of the anecdotes about the pitchers are much worse than they just had a down year the year after their big year. Mark Prior and Kerry Wood, for instance, had their big year (2003), and then were both completely broken for the rest of their lives. (see http://en.wikipedia.org/wiki/Mark_Prior).

Also, separately, pitching is well-known to be really damaging to your arm in general. Pitchers have to ice their arm afterwards *to stop the bleeding.* It seems highly plausible that pitching ridiculously much the year after not pitching that much would murder your arm.

Now that Will Carroll has posted on his effect, let me make it clear I'm a different Will, and not an injury guru.

I don't think the analogy is completely worthless, because both situations are examples of intuition leading to the wrong answer, which is a common thing - our ape ancestors conquered the world in part because of their ability to find patterns in data, but this also gives us a knack for finding patterns that aren't there.

I see the Monty Hall problem as a failure of data assimilation, while the Verducci Effect just seems like selection bias and/or regression to the mean. But here's another idea: what if pitchers tend to be called up to the majors when healthy, so they tend to get injured as time passes? This would be another form of selection bias, because we start the analysis when they enter the majors, which is a healthy point on their random walk through relative wellness.

I'm sure that pitchers tend to get more injured with work, because every pitch is a chance for an injury. I'd even grant that the chance of injury on a given pitch may increase with increased pitches, if I saw some data on that. The Verducci Effect may capture these factors, but it also captures biases, like the rule of 370. One reason we don't see the problems is that most players follow the same basic career arc, so we don't have a control group to compare with. We have precious few Ricky Williamses to help us tease apart the effects of age from overwork, etc.

Okay, I just want to spell this out again. Let's say that leaguewide, one in three seasons is an 'injury season' for a pitcher.

So three seasons: 2008, 2009, 2010. Each of these has a 33.3% chance of being an injury season.

If this was a Monty Hall problem, we'd see that 2008 was a great season, and then guess 2009 would be an injury season. 33% chance that's true, 66% chance that the injury season is 2008 or 2010. Monty would give us that stat sheet for 2008, showing a great season, and we'd mathematically want to switch to 2010, since there'd be a 66% chance that would be the injury season, compared to 33% that it would be 2009.

Then the season would end, and let's say there's no injury. Montywise, there's now a 100% chance that 2010 is an injury season, so we'd've won our bet before the games are even played. That obviously doesn't make sense, so we toss it out.

I think the key factors are that (a) due to age and physical stress, almost every player in every sport is more likely to get injured in year N+1 than year N, and (b) due to regression to the mean, any season that is unusually healthy is likely to be followed by a year that's less healthy than it (but not less healthy than average). If the average player misses 10 games a season, and your guy misses 2 in 2008, he'll probably miss more than 2 in 2009, but is no more likely to miss more than 10.

Here's my guess at the explanation. Some pitchers are less durable (more injury prone) than others. The leaguewide injury rate stays relatively low because most pitchers eventually settle into roles that they can handle without getting injured too often - which means that high workload pitchers tend to be among the more durable pitchers. But pitchers who are just getting their first chance or two at a high workload are not especially durable, because the more injury prone pitchers haven't been weeded out yet, so their injury rates will be higher than those of the established high workload pitchers. So, consistent with the Verducci effect, pitchers in their second year with a large workload will get injured more than the typical high workload pitcher.

Another way to put it is that it is in terms of regression to the mean for healthiness. Consider the pitchers who are given a chance at a high workload in one season. Those who are coming off of a very healthy year are likely to be somewhat less healthy this year (because of regression). If the pitcher has already had several healthy high workload years, then we have a lot of evidence that the pitcher is skilled at remaining healthy, so we wouldn't expect that much regression. But if the pitcher is coming off of his first healthy high workload year, then we don't have much evidence of his skill at remaining healthy so we'd expect more regression (higher injury rates).

I think in much the same way that it is said old boxers have one great fight left in them, it may be the case most good pitchers can have a high workload high quality season.

Consider the following:

20% of pitchers can have multiple back to back to back solid high performing seasons

80% of pitchers don't have the durability and quality to produce those back to backs, and can have one good season, but it's all their body can take

If you assume the 20% have 5 good seasons, and the 80% just the 1, then of all good seasons, 1/2 come from the "durable" pitchers, and 1/2 from the "1 hit wonder" pitchers.

Thus, of good pitched seasons, there is nearly a 50/50 shot of another. However, of a good season by a given pitcher in his first year, there is an 80% chance the next year is a dud.

I think it is this discrepancy in the actual sustainable high end pitchers vs 1 yr pitchers that distorts the numbers, to picky back on Vince.

I seriously spit out my drink when Will Carroll accused Brian of copying Leonard Mlodinow's pop statistics book. Getting the name wrong was the icing on the cake.

Yeah, that was immature. Will doesn't understand my point, and it's natural for him to be defensive.

His Bob Barker analogy makes my point. The story he tells is about someone trying to make what is effectively a future prediction. The very problem with the Verducci analysis is that it is looking backward already knowing many of their wrong predictions. Their "future" predictions within the past data would therefore be more accurate than a random guess.

Please see the follow up article and its comments for further discussion. I now am confident that the paradox/illusion I describe is real. However, I am no longer confident in any way that it applies specifically to the Verducci analysis. It would depend on the particular methodology of the analysis.

Follow up

I think you can't talk about entropy here ; in fact entropy is not matters and energy in the universe goin' into deterioration.

The 2nd law of thermodynamics sate that the entropy of the universe cannot decrease ; meaning that the amount of energy is always the same, and if a system loose energy (like two hygrogen atoms and one oxygen atom bonds to create a more stable, organize therefore less energetic, water molecule) this energy goes on to increase the outside energy : the entropy of the universe increase or stay the same, yet on each case the energy is equal.

Entropy is a way to linked the organisation state of matter with energy and therefore predicting the evolution of the universe or ice on a beach in LA. The more entropy is created, the more a system elsewere is evolving towards less entropy, ie a less energetic state, ie giving energy to the universe.

of course what become complicated is that a system doesn't always evolve to the most stable state, as certains states are not spontanious, needs activation energy, deepends of the environement, ectect for exemple molecules are more stable, ie energetic state favorable just because the atoms where at one point so close that at this moment they are more stable.

So I think it is more relevant saying that entropy nerver decrease globally.