In response to a few questions on my last post regarding my World Cup win probability (WP) model, here are some actual numbers to chew on. An anonymous commenter pointed us to actual win rates at WhoWins.com (a fun site by the way). I've graphed the actual rates below.
The actual win rates are for 708 games stretching all the way back to 1930. The theoretical WPs based on a Poisson distribution are the solid lines, and the actual rates are the little triangles and squares. Keep in mind these are the WPs for the trailing team.
There are a couple of factors that explain the differences. First, we would normally expect the the actual WP for the trailing team to be lower than the theoretical WP. This is because better teams would tend to be the team that scores first. But we see just the opposite--trailing teams win more often than we'd expect.
That backwards result could be due to strategy effects. Trailing teams would be expected to become more aggressive, increasing their chances of winning or tying, but also risking falling further behind. On the other hand, the team ahead would be expected to hunker down and adopt a defensive strategy, largely neutralizing the trailing team's increase in aggressiveness. But the differences are too large to be explained by strategy effects. If strategy effects are stronger than team-strength effects, we'd be seeing a much more aggressive style of play than we do. Instead, we are witnessing a very safe style with low scoring rates (even for soccer).
And I think that's what's really going on. Scoring rates in World Cup play are at historically low levels. My theoretical model is based on 2006's scoring rate of 2.4 total goals per game (1.2 goals per side). According to Steven Dubner in the link above, so far this year the rates are even lower.
The lower the overall scoring rate, the harder it is to overcome a deficit. This is easy to understand intuitively. Just compare basketball, where being 2 scores down at halftime can be easily overcome, to soccer, where being 2 scores down at halftime is almost insurmountable. The effect is the same within a single sport for varying scoring rates.
If I change my model's parameter for scoring rate from 2.4 to 3.0 total goals per game, the theoretical and actual win rates match up very well. I suspect 3 goals per game is pretty close to the typical World Cup scoring rate averaged out over the years.
However, the actual and theoretical rates still don't match exactly. In a game's early minutes, trailing teams' actual rates still exceed the theoretical WP. This can be understood as a result of bias in the numbers. The higher the scoring rate for a given era, the more likely it is to have a game where a team takes an early lead. And the earlier in the game a team had a lead, the higher the scoring rate of the era was likely to be. For example, in 1950 when the rate was 5.4 goals per game, there were likely far more examples of early-game 2-goal leads, and that era will dominate the data.
Ultimately, I suspect the WP numbers estimated with the 2.4 goal-per-game parameter are reasonably close to what we can expect this year.
Lastly, I suspect the low scoring rates may also explain the unusually high number of upsets (and "upset-ties") so far. The lower the overall scoring rate, the more likely upsets are. If an underdog team scores 50% less often than its favored opponent, it would almost never win in a game of basketball. But in a sport where there is typically a total of 1 score all game long, it will win about one third of the time, ties notwithstanding.
There are several theories about the low scores in South Africa this year. Dubner suggests it may be the ball, widely disliked by players. I think there are several factors, the most important being the overall historical trend. I also agree with one of the comments that in the early group play, the better teams play conservatively, thinking, "Hey, let's just make it out of the group round. Then we'll crank it up." I suppose you could consider it a "meta-" strategy effect.
That backwards result could be due to strategy effects. Trailing teams would be expected to become more aggressive, increasing their chances of winning or tying, but also risking falling further behind. On the other hand, the team ahead would be expected to hunker down and adopt a defensive strategy, largely neutralizing the trailing team's increase in aggressiveness. But the differences are too large to be explained by strategy effects. If strategy effects are stronger than team-strength effects, we'd be seeing a much more aggressive style of play than we do. Instead, we are witnessing a very safe style with low scoring rates (even for soccer).
And I think that's what's really going on. Scoring rates in World Cup play are at historically low levels. My theoretical model is based on 2006's scoring rate of 2.4 total goals per game (1.2 goals per side). According to Steven Dubner in the link above, so far this year the rates are even lower.
The lower the overall scoring rate, the harder it is to overcome a deficit. This is easy to understand intuitively. Just compare basketball, where being 2 scores down at halftime can be easily overcome, to soccer, where being 2 scores down at halftime is almost insurmountable. The effect is the same within a single sport for varying scoring rates.
If I change my model's parameter for scoring rate from 2.4 to 3.0 total goals per game, the theoretical and actual win rates match up very well. I suspect 3 goals per game is pretty close to the typical World Cup scoring rate averaged out over the years.
However, the actual and theoretical rates still don't match exactly. In a game's early minutes, trailing teams' actual rates still exceed the theoretical WP. This can be understood as a result of bias in the numbers. The higher the scoring rate for a given era, the more likely it is to have a game where a team takes an early lead. And the earlier in the game a team had a lead, the higher the scoring rate of the era was likely to be. For example, in 1950 when the rate was 5.4 goals per game, there were likely far more examples of early-game 2-goal leads, and that era will dominate the data.
Ultimately, I suspect the WP numbers estimated with the 2.4 goal-per-game parameter are reasonably close to what we can expect this year.
Lastly, I suspect the low scoring rates may also explain the unusually high number of upsets (and "upset-ties") so far. The lower the overall scoring rate, the more likely upsets are. If an underdog team scores 50% less often than its favored opponent, it would almost never win in a game of basketball. But in a sport where there is typically a total of 1 score all game long, it will win about one third of the time, ties notwithstanding.
There are several theories about the low scores in South Africa this year. Dubner suggests it may be the ball, widely disliked by players. I think there are several factors, the most important being the overall historical trend. I also agree with one of the comments that in the early group play, the better teams play conservatively, thinking, "Hey, let's just make it out of the group round. Then we'll crank it up." I suppose you could consider it a "meta-" strategy effect.
http://footballpolemics.wordpress.com/2010/06/16/why-are-there-so-few-goals-in-world-cup-2010/
So far in the 2nd set of games in group play, goals per game has reverted back towards the mean for the modern era.
The ball isn't a problem http://www.vm-guide.dk/video-referat-cameroun-danmark/ Look at the videos tagged "Målet til 1-1 – Nicklas Bendtner" and "Målet til 1-2 – Dennis Rommedahl". Kjær lays the ball perfect 30-40 meters in front of him and Rommedahl takes it perfect. They could use a normal ball, and nobody would notice. Give me some videos, where the ball does something it wouldn't normaly would does, and we can talk about where the ball is bad or not
Hey Brian, since you're my favorite NFL quant, I was wondering if you could analyze the penalty kick effect according to pressure situation. My gut feeling is that penalty kick success rate is depressed in clutch situations, but I have never seen any research on this.
Link-Check out these posts by Phil Birnbaum:
http://sabermetricresearch.blogspot.com/2010/06/huge-choke-effect-reported-in-soccer.html
http://sabermetricresearch.blogspot.com/2010/06/huge-choke-effect-reported-in-soccer_15.html
The bottom line is that the research is inadequate at this point. My own gut feeling is that the choke effect is very small. Top players get to where they are because they have been successful all their lives in pressure situations. Besides, the pressure is on both the kicker and goalkeeper. In the NFL, field goal kicking shows no signs of choking.
Herein lies the problem with focusing analysis on something as limited as the World Cup. Each tournament has a very limited spate of games. Even worse, tournaments only take place every four years. So just at the point that you start to put enough games in the spreadsheet that it feels like it's a reasonable sample size, you realize that you've gone so far back in time that any insights gained are specious as best.
This isn't to say that there is NO value in very old historical data, but the game as it was played 50, 30, or even 20 years ago is very different than it is today. To illustrate my point, just think about the statistical value (or lack thereof) of NFL data from 50 years ago. Passing stats from, say, 1970 have almost no bearing whatsoever on passing stats from today's NFL game. And yes, I know that soccer is simpler and has not changed as radically as the NFL has over the last 50 years, but it HAS changed.
"But the differences are too large to be explained by strategy effects. If strategy effects are stronger than team-strength effects, we'd be seeing a much more aggressive style of play than we do. Instead, we are witnessing a very safe style with low scoring rates"
Doesn't that assume rational, risk-neutral strategy? We know from long experience analyzing the NFL that that's a bad assumption. In reality, teams often play to not lose until they are actually losing, despite incentives to behave differently.
It would be interesting to pull data from other high level soccer (Confed cup, UEFA championship, EPL, Serie A, Bundesliga, La Liga, Champions League) to get a larger data set, and to see if World Cup soccer really is different than the others.
I think Tarr makes a very good point. We've seen evidence of overconservatism in nearly every sport (i.e. bunting, fourth down, etc). I'm not a soccer expert but it did appear to me that there were multiple times when teams were playing for a tie when it wasn't really in their interest.
Among other things, the point structure appears to encourage aggression. If you win a game, you get 2 more points than when you tie. If you lose a game, you get 1 less point than if you tie. Yet, it seems like the strategy is almost always "accept a tie instead of gambling on a win".
I also think this might somewhat explain why trailing teams have come back more than expected. "The best defense is a good offense" is a trite cliche, but there is some truth to it. It appears to me like soccer teams that have single goal leads keep way too many people in the defensive zone. They dramatically scale back their efforts to score more. Obviously, this makes it a little more difficult for the trailing team to get a shot in the net, but it also means that the vast majority of the rest of the game will be played on that side of the field. Even with a lower rate on each opportunity, the sheer number of opportunities translates into more comebacks.
Again, I'm not a soccer expert. so what do I know? I would be curious to read advancedsoccerstats.com if anyone came up with it.
I think the history of the World Cup is definitely a factor for the high WP of teams trailing in the first half. The WC has only had 32 teams since 1998. Before that it was 24 starting in 1982 and it decreases various years down to 13 for the first one (according to the always correct wikipedia). So many of the WCs prior to 1982 were mostly played out among soccer's perennial powers. That would suggest that going down one point in the first half is not an insurmountable challenge for the trailing team because they are also a talented side and could make a comeback. Add that to changes in the style of game and I think it goes a good bit to explain the high WP data.
What this theory goes nowhere in explaining is why goal scoring is so weak these days. If we assume the inclusion of more teams to the WC would inevitably bring in some sorry teams *cough New Zealand cough* shouldn't goal scoring dramatically increase when a team like... say Portugal runs up 7 goals on North Korea? Maybe playing hard core defense and opportunistic scoring is a much more effective tactic than previously thought for underdogs (Greece in Euro 2004). I dunno. Any suggestions?
Sorry if this is a bad place to put this, I'm not sure where else to write it, but I couldn't help but notice that the play for play data got taken down today. I didn't have a chance to download all of the files. If anyone has them and would be willing to share or if they could get reposted, I'd greatly appreciate it. Thanks.
John-I'm re-writing the post to include 2009 data. It will be back up shortly.
USA!!!!!!!!!!!!!!!!
Has anyone looked to see if scoring is constant throughout the world cup stages? I wonder if scoring is usually depressed in the first set of group games and then moves back up. I could see a lot of scoring when teams know they need goals to advance, which also opens them up defensively.
Also take into consideration how more weaker teams may decrease scoring - if a team knows it's outclassed it might play defensively the whole game only going on counter attacks (Greece, New Zealand) which would lower scoring. Contrast with a slightly weaker team being worn down by a superior opponent - North Korea vs Portugal, with 6 goals in the 2nd half after a close fight in the 1st, and you could make a case for the last two US games where they ran Slovenia and Algeria ragged, creating lots of opportunities in the 2nd half.
Brian - Thanks so much. I had a mini heart attack when I realized the page was gone and that I nearly missed a chance at seven years of data arranged for coding.
The Jabulani ball is definitely a factor and FIFA even admitted as such on 27 June. During the match for third place on 10 July, the goal scored by Germany's Mueller is a textbook example of its awkward flight.