Last year I started my stint at the NY Times by calling attention to just how bad NFL preseason predictions are. I compared the “advanced” projections for team win totals compiled by a fellow stats website called *Football Outsiders* to two benchmarks. They had predicted doom and gloom for the Jets last year, and my article was intended to relieve Jets fans of needless worry. As it happened, the Jets made the playoffs and went all the way to the AFC Championship game.

The first benchmark was a mindless 8-win prediction for every team. Let's call this the Constant Median Approximation system, or CoMA for short. This benchmark represents zero knowledge. It’s what you would guess if you had no information at all about any of the NFL teams except that they each play 16 games. Certainly anyone can out-predict a coma patient, right?

The second benchmark is a very simple formula based on a regression from the previous season win totals for each team. The formula is 6 plus a quarter of last year’s wins. For example, a 12-win team from 2009 would be predicted to have 6 + 12/4 = 9 wins in 2010. I call simple methods like this ‘Koko the Monkey,’ named after George Costanza’s unflattering nickname at Kruger Industrial Smoothing.

There are several ways of comparing predictions to actual results. One of the simplest and most practical is called Mean Absolute Error (MAE), which is simply the average of the difference between the predicted and actual numbers. A more common method among statisticians is to use something called Root Mean Squared Error (RMSE). RMSE is the square root of the average of the squares of the differences between predicted and actual values. This method tends to penalize predictions that are very far off and rewards predictions for high accuracy.

So how did the coma patient and Koko the monkey compare to FO's predictions for 2009? Keep in mind, the lower the number, the better the accuracy.

MAE | RMSE | |

FO | 2.6 | 3.0 |

CoMA | 2.5 | 3.1 |

Koko | 2.2 | 2.8 |

Football Outsiders’ predictions don’t fare very well. In fact, the CoMA predictions are no worse in general and slightly more accurate in terms of MAE. Koko the Monkey is easily the most accurate of the three systems. This is a notable result because it suggests that Football Outsiders’ predictions are so bad, they are literally worse than having no football knowledge at all. It's like negative information, draining us of insight.

I’m reminded of the classic line from the movie

*Billy Madison*. After the main character gives a nonsensical answer to a quiz-show question, the moderator replies, “What you've just said... is one of the most insanely idiotic things I have ever heard. At no point in your rambling, incoherent response were you even close to anything that could be considered a rational thought.

**Everyone in this room is now dumber for having listened to it.**I award you no points, and may God have mercy on your soul.”

I realize we're having a little fun at FO's expense, and I know it’s not completely fair to pick on only them. Most predictions from self-declared experts are probably just as inaccurate. But FO sells their predictions, and that makes their products fair game. After all, customers should know what they’re getting for their money. People rarely go back to verify predictions (except Gregg Easterbrook), and those that make the predictions usually only mention the cherry-picked few that were somewhat accurate.

2009 was one of the worst years for FO predictions on record, which Aaron Schatz has admitted to.

http://www.footballoutsiders.com/stat-analysis/2010/oddities-2009

How does Koko fare against FO over a longer sample size of several seasons?

Yeah, FO is not the one cherry-picking here.

Can we get some std errors attached to these values? Are the MAE's statistically significant from each other?

To be a bit more fair, as this article seems a bit biaised towards bashing FO for the sake of it (underlying theme, even though the arguments are solid), let's take this line from Easterbrook :

"Brian Burke, whose Advanced NFL Stats is worth your perusal, sometimes delves too far into hyper-specificity. Burke forecast a "94 percent chance" that Pittsburgh would beat Oakland in Week 13; Oakland won."

Ha !

Preseason predictions are a case study in cognitive bias. There are many biases involved in every flawed prediction, but I think the most prevalent is the tendency of all fans (and all those fancy prognosticators are really just FANS) to simply look at last year's results and assume that this year will be more of the same.

It's been pretty well documented now that over the last decade or so there has been a ~50% turnover in playoff teams every year. But when you pick up those preseason prediction rags, they consistently predict 9-11 of last year's 12 playoff teams will return to the postseason.

13% underdogs will win 13% of the time. Easterbrook cherry-picked one of those occasions. Ha yourself!

By the way, I agree with him on the hyperspecificity point. But trust me, if I don't add all those extra decimal places, the probabilities don't add up to 1.00, and I get angry emails and comments for days.

Win for the Billy Madison reference.

2nd the question about previous years.

Regarding previous years:

http://www.advancednflstats.com/2007/09/pre-season-predictions-are-worthless.html?showComment=1188933960000#c3451626887957532322

and

http://fifthdown.blogs.nytimes.com/2009/07/22/an-antidote-to-pessimism-on-the-jets/

I feel dumber for having read Greg Easterbrook's review of The Black Swan in the NYT.

http://www.fooledbyrandomness.com/easterbrook.pdf

Speaking of Koko the monkey, will he be making his Fantasy Football rankings this year?

Yes, will they? And will you include IDP rankings like you didn't last year (I did email them to you, Brian... you never posted those articles.)

Yes, they will. I've got them but need to put the posts together. Sorry for not putting up the IDP rankings. I had to give up on the community site. It just took too much time to format all the tables, etc.

But I would caution everyone that the Koko fantasy rankings are intended as a benchmark of comparison and not necessarily as bonafide predictions themselves.

Alchemist,

Just because 50% of playoff teams tend to not earn repeat trips doesn't mean there's something wrong with projecting 10 of last year's playoff teams to once again make the playoffs.

"Just because 50% of playoff teams tend to not earn repeat trips doesn't mean there's something wrong with projecting 10 of last year's playoff teams to once again make the playoffs."

There's something wrong with it if you are consistently predicting, year-after-year, that 9-11 of the same teams will make the playoffs. If you honestly believe (or if your numbers honestly tell you) that this year is somehow different and that there will be an unusual number of repeat entrants in the postseason, then by all means, stick with your prediction. But I believe this is rarely the case.

Most of these wannabe prognosticators just look at last year's results and assume that this season will pick up right where last one ended.

The Saints in the Super Bowl? You must be crazy! They finished 8-8 last year (in last place) and their defense was awful. And obviously the Steelers will be back in the playoffs. They're the defending Super Bowl champs!

It's an odds thing, Alchemist: the top teams have the best odds to make the playoffs, but the odds are that only half of them will. So the best projection is to project the top 12 teams (most of which will be the same as last year) but the odds are that the top 12 teams won't actually make it. Does that make sense?

It's like in baseball--none of the top hitters are projected to hit more than 45 home runs (their best projections are that low) but you can project that the top hitter will hit over 50. You just don't know who it is.

Couldn't random chance explain the unpredictability of the NFL? Maybe the Steeler's were a 10 win team last year, but an unlucky bounce and some bad tie breakers kept them out of the playoffs. Maybe the Saints were only a 10 win team that happened to get lucky in a couple of close games.

Also, can we really judge FO for predicting the Pats would do great in 2008 when Brady went down in game #1. You can't predict that.

Here's what I think. Most playoff teams that don't lose a ton of players to FA or have harder schedules are favorites to make the playoffs again. However, some of those teams will have injuries or bad luck. We don't know which teams beforehand, so we can only make our best guess based on available information.

Let me mediate a bit. Chase is correct that in attempting to be as right as possible there is nothing wrong with picking mostly the same teams to do well again. It will minimize the huge mistakes that are made in trying to predict the new 50% of teams in the playoffs. On the other hand if you are seeking perfection or to win a pool with thousands of players, it may be crazy to not have a greater turnover from year to year. It all depends on your goal. Is it to be as correct as possible, or to have a realistic shot at being the best in a large group.

Chase is hedging his bets. Which, in the case of a tumultuous NFL might be the best thing to do.

Twelve teams make the playoffs each year. Eighteen teams do not. Over the last 14 years we run right under 50% return trips to the playoffs. So, we know that the average number of returning playoff teams will be right around 6. The standard deviation over the last 14 years is almost exactly 1. So, we can be 95.4% confident that between 4 and 8 of last years playoff teams will return. So, if you pick ALL of last years playoff teams to return you can be 97.6% confident that you'll get AT LEAST 4 correct. You could be 84% confident of getting 5 correct.

On the other hand, only 33% of a past year's playoff outcasts will make it to the playoffs the following year.

So, for each playoff team you have a 50% chance of getting it right if you pick them to return. For each non-playoff team you have a 33% chance of getting it right if you pick for them to make the playoffs the following year. The betting odds are clearly benefitting the playoff teams. If you had to make this bet every year for the rest of your life you'd be best off picking the previous year's playoff teams.

Now, of course you could modify based on other factors: retirements, injuries, free agents, coaching changes. But, the odds are still better when you favor last year's playoff teams over last year's non-playoff teams.

P.S. This has nothing to do with the records of the actual teams only their likelihood of returning to the playoffs.

This is the same argument used in March Madness brackets every year: "No year has had all 4 #1 seeds make the Final Four." Of course, it just happened, but beyond that in most years the 4 #1 seeds do have the best chance of winning their respective regions. The combined likelihood of all 4 of them doing it is very low, but since you can only pick 4 teams, it might as well be those with the best odds.

"Also, can we really judge FO for predicting the Pats would do great in 2008 when Brady went down in game #1. You can't predict that."

Anon - exactly, and that's what makes pre-season predictions worthless.

This is, frankly, a dumb way to look at it. It's akin to looking at how a site does at predicting games against the spread and looking at every single game rather than the games that it highlighted. If you single in on teams where there is a significant difference between popular perception (as evidenced by the betting line) and the FO projection, say a gap of two or more wins, I can tell you from experience that there's money to be made.

And whatever else you want to say about FO's predictions, you can hardly say that they are guilty of simply picking the same playoff teams to repeat over and over. If I'm not mistaken, they pick Kansas City and Washington to win their divisions this year, and teams like Dallas, New Orleans, Minnesota and Houston to slide way off.

I think you are all missing my point. If you want to have the best statistical chance of being correct on every pick, then just take last year's playoff teams and "predict" them to all return to the postseason. But that is intellectually boring and journalistically lazy. Any idiot can do that.

We see the same thing in the stock market. So many idiots look at a chart for a stock that has gone up over the last 12 months and they say, "It's going to go up!" and they see an opposite chart for a stock that's gone down over the last 12 months and they say, "It's going to go down!" They're just drawing a mental line that continues whatever obvious trend they see on the chart. It takes absolutely no knowledge of the company or of the stock market in general to do this. My mentally-challenged 12-year-old son could do this.

If we agree that the preseason predictions are generally worthless and that sports in general is one big source of entertainment (from the fans' perspective), then by all means, ENTERTAIN ME. Make some bold predictions. Back them up with some insight/analysis. Sure, they may all be wrong. But if you're just going to spit out the same 12 teams that made it to the playoffs last year, then why do I even want to bother reading your stupid magazine? There's no entertainment value in that.

I don't see much entertainment value in watching journalists tell us that the Detroit Lions are winning the Super Bowl. Honestly, I think predictions are worthless from an entertainment standpoint as well, simply because predictions don't generally correlate with reality.

My name is Ben Riley and I used to write for Football Outsiders. Brian, in one of your older posts on this subject you wrote in the comments that "it took dozens of FO 'game charters' and ultimately thousands of man-hours of analysis to come up with their predictions."

I'm going to try to blow your mind now by telling you that this isn't true. The game predictions at FO were a complete black box, even to those of us on staff. I think Aaron Schatz allegedly works with some statistician somewhere to come up with the numbers, fed through a computer or mysterious game-predicting machine, or something. Now, it's *possible* the game-charting numbers are fed into the formula, but unless Aaron's said that this is what happens, we don't know that to be true (and I doubt it).

There are a lot of talented, funny writers at FO. But they are writers first, and stats-geeks second (at best).

lol love it

I think Alchemist is really on to something. One of the most important things about predictions is that they are supposed to be "entertaining". They are designed to sell magazines and subscriptions right now not to win a five year statistical study five years in the future. That means there is a definite need to go out on a limb. Making a prediction like Koko does that says "I predict everything the same but with regression to the mean thrown in" isn't going to sell anything.

so how do we do it better?

I would contend scheduling has more to do with the season a team has than who is on the roster, provided you have a pretty good roster.

Regression is engineered in the NFL, they schedule to ensure parity.. and then 2 teams accounted for 5 of the last 10 championships..

Their engineering has made it so that there is huge turnover in teams that make the playoffs, 50% with no real change in who the ELITE teams are..

NYG, IND, PIT were in 2 super bowls in last ten years. NE was in 4. That's 4 teams accounting for 10 of the 20 super bowl spots. Is that parity?

"Regression is engineered in the NFL, they schedule to ensure parity.. and then 2 teams accounted for 5 of the last 10 championships.. "

This is a myth that has become repeated by so many lazy reporters that it is now taken by most people as fact. Out of 16 games, there are a grand total of TWO of them that have anything to do with the record of your opponents.

Each team doubles up against division opponents - that gets us to 6 out of the 16 games. Then they play against all the teams from another division within their conference - that gets us to 10 of the 16. Then they play against all the teams from another division in the opposite conference - that leaves only 2 of the 16 games that can be scheduled based on the previous year's record.

Now if I'm a division winner, those two "parity games" will be against other division winners, and I should, on average, split those games. This means that I will, on average, have one more loss than I would have expected if I had played two doormats. So on average, the "parity scheduling" hangs one extra loss on each of last year's division winners.

Does that extra loss make a difference in the standings? In a short 16-game season, sure it does. But it's a far cry from saying that the NFL "schedules to ensure parity". For any team in the league, I can tell you right now who 14 of their opponents will be in 2020 - and it has absolutely nothing to do with the relative strength of the teams involved.

Please don't take this to mean that I am downplaying the importance of scheduling in determining a team's win-loss record. One of the largest factors in a team's fortunes is whether it gets lucky or unlucky by being paired against relatively-weak or relatively-strong divisions within its conference and the other conference. For example, it would be really lucky for a team to be paired against all the teams from the NFC West and the AFC West, as these teams look to be relatively weak. But getting such a schedule would be based on luck - not on any parity scheduling.

"I think Alchemist is really on to something. One of the most important things about predictions is that they are supposed to be "entertaining". They are designed to sell magazines and subscriptions right now not to win a five year statistical study five years in the future."

Yes, but these predictions lose much of their entertainment (or persuasive) value when you already understand how meaningless and gimmicky such predictions are to begin with.

what about Paul the Octupus? explain that one! :).

Will Koko be comparing his fantasy rankings from 2009 to the actuals?

http://www.advancednflstats.com/2009/08/koko-fantasy-football-monkey.html

I'm not even a stats-geek second, much less a differently-gruntled former FO employee...but isn't this a sample-size problem? Wouldn't one always expect significant variance with any prediction, even if one ran a great simulation 10000 times?

If all FO did was make predictions, I think this would be more relevant. Instead, you are picking on a site that offers great insight into teams and then offers probabilities.

Were the Jets really very good last year? They caught a massive break the last two weeks of the season or they well could have finished 7-9 and the FO prediction wouldn't have looked so bad.

A few thoughts:

#1 This is not cherry-picking one errant year. I've seen that comment here and on a few other sites which linked to this article. First of all, 2009 is the most recent year, so of course that's relevant and not cherry-picked. Besides, FO's track record is very poor going back five seasons, essentially tying the far simpler methods. See the link I posted above for previous years.

#2 It's fine if predictions are intended for entertainment purposes only. But that's not the case here. Is the point that FO is nothing better than the football equivalent of the psychic hotline?

#3 If you sell analysis and predictions, be prepared to be held accountable, especially if the analysis is a statistical black box.

#4 I agree, and have been on record previously, that FO produces very fine non-statistical analysis. It's no worse than 99% of the superficial stuff we read everyday on espn.com or anywhere else.

#5 However, if their statistical projections are informed by their qualitative analysis, or their qualitative analysis is informed by their stats, what does that say about the quality of their "other stuff?" Further, one big reason why people buy into predictions is the level of detail in the basis of the projections. People are far more likely to believe fake stories with rich detail.

#6 I'm picking on FO because they claim to have a very technically advanced and complex prediction system and they sell their products based on their claims of technical analysis, not for any other reason.

The point is, unless you have an open, documented, proven method for predictions, all you're doing is palm-reading. Let's face it, all the pre-season analysis, projections, and predictions are nothing better than tarot card/Ouija board/crystal ball/fortune cookie nonsense. But people still eat it up, despite knowing better. Let's just not pretend it's anything more than it is.

Also, it's remarkable how some FO defenders react emotionally. I think there is some deeper stuff going on, namely cognitive dissonance. If a customer regularly forks over money for a product, which later turns out to be bunk, then the customer has a decision to make. He can either admit he's been a fool, or he has to somehow preserve the illusion that the product is worthwhile. No one likes to admit folly, even (or perhaps especially) to themselves, so people do all sorts of psychological contortions to preserve their self-image of competence.

According to the Vegas Watch blog, FO projections went 55-41 (.573) against the Vegas Over/Unders from 2006-2008. That seems pretty good to me.

http://vegaswatch.net/2009/07/football-outsiders-vs-vegas-2007-2009.html

2009 was an outlying season in several ways, as FO has readily admitted and documented. Their projection system regresses offense and defense separately and for some reason the coefficients measured from the last decade didn't fit last year at all. Instead of blindly bashing FO, I'd rather see an intelligent discussion about a) whether you've confirmed this with your own numbers, b) why you think this might be so, and c) what it might mean for the future of the NFL (if anything).

I'm also curious why you're so sure that all preseason predictions are worthless, yet at the same time you're convinced that your own game prediction system can beat Vegas. Have you attempted to create your own season projection model and failed?

You've accused me of "blindly bashing FO," but one thing we can agree on is we're all blind when it comes to their numbers, yourself included. It's a magical black box, impervious to criticism. It can only be judged on the results it produces, which is what I've provided here. If that's not interesting or useful to you, feel free to go back to reading your Ouija board.

True, the RMSE for Vegas over those particular 4 seasons was 3.1 while FO's was 3.0 (about the same difference as a coma patient and FO). However, with 2009 added, it becomes a wash at best. Additionally, the Vegas Watch link does not account for the odds given for each team. For example, the over/under on the Ravens might be say 10, but to bet the under (9 or fewer wins) you have to pay 120 to win 100, meaning the real over under is not 10 but 9 or so.

To answer your question, I wouldn't buy that explanation without seeing the data. Just looking at their list of predicted team win totals for 2009, it appears they predicted some teams to improve drastically, well outside what a normal year-to-year improvement would suggest. In other words, FO was not burned by relying on year-to-year correlations in performance, but because they deviated too far from them.

It appears you are referring the FO "2009 Oddities" post linked in comment #1, where they showed that offensive year-to-year correlations went through the roof and defensive year-to-year correlations fell off the table. It has nothing to do with DVOA or a "magical black box", as they showed the same effect using points scored/allowed.

Obviously the key to projecting future performance based on past results is knowing how much to regress toward the mean. Such a seismic shift in y-t-y numbers would throw any model off track regardless of what other bells and whistles are added in to the mix. This finding is fascinating to me and should be to anyone interested in performance projections or team building. Will the trend continue? Should GMs now allocate more of their resources on offense? It's truly unfortunate that any researcher would avoid the opportunity to explore the issue further out of some apparent vendetta against another researcher.

This really isn't the right way to test predictions.

RMSE only means something statistical if the distribution's Gaussian. It's not. It's given in the Almanacs, and it's clearly non-Gaussian.

If you have a flat distribution from 6-10 wins, you'll have a mean of 8 wins. If the team then wins 6, you'll say "ah ha, your error was 2 wins"... but the result was actually perfectly expected. You started off by saying "the team will win between 6 and 10 games, and they all seem perfectly likely" and in the end, that's what happened. Looking at mean error and RMSE is a terrible way of evaluating a prediction that gives a non-Gaussian distribution.

You want to do something like a Kolmogorov-Smirnov test. We already know that a K-S test based on Koko the Monkey is crap - the win distribution for teams doesn't look like a random draw around 8 wins.

Although it should be stated that if you tried to do a K-S test by drawing randomly from each team's given probability distribution also isn't quite right, because the probability distributions aren't independent and you'd be drawing them independently. You really want to compare the *overall* season results with the likelihood from the simulated season results.

For that you'd have to get the simulated season results, which you could probably get by asking.

For the more technical-minded: the reason mean and RMS are used is because the mean is the first moment of a a Gaussian distribution, and the RMS is the second moment. So you're essentially doing a poor-man's K-S test, assuming the underlying distributions are Gaussian. The moments of the win predictions given by FO are clearly not well described by mean/RMS.

Keep it up, Brian! Not everyone is missing the point of your article!

barawn-Hey, you're right! I ran a K-S test and it turns out the Rams won 9 games and Washington won the NFC East. Who knew?!

The distribution of NFL wins is reasonably Gaussian, and RMSE is a fair and well-suited way of evaluating the error.

Is there some reason you've just started becoming snide in most of your responses? I'm not criticizing just for criticism's sake. Certain predictions from FO are very uncertain (any which involve coaching changes) and several are very certain (those which involve the Colts and Patriots, for instance).

Lumping them all together -

when they tell you, flat out, what the uncertainty isis just bad statistics. Stating that "it's so bad, it's like negative football information" is worse than that - it's a silly analogy that doesn't do anything but lower the bar of the conversation.The distribution of NFL wins is reasonably GaussianNot from your own research.

http://www.advancednflstats.com/2007/08/luck-and-nfl-outcomes.html

and RMSE is a fair and well-suited way of evaluating the error.No, it's not. You're just looking at the error of the mean prediction. That's just ridiculously simplistic.

Comparing a prediction by

justits mean value when it has a distribution isof coursegoing to make an "idiot prediction" (i.e. one static value close to the mean, or close to an existing distribution) look better when you've got a distribution this wide (see here).FO gives a likelihood distribution for each prediction. How do you test that prediction? You lump together all teams which are predicted to win 8 games, say, 40-50% of the time. Then you compare it to how often they

dowin 8 games. Then you lump together all teams which are predicted to win 8 games 30-40% of the time, and compare it to how often theydowin.Or, if you want to be really simple: compare the RMSE to the

expectedRMSE based on the width of the distributions given by FO. It's not agoodtest, because they're not Gaussian, but hey, if you insist they're close enough, sure.Koko the Monkey's predictions and the CoMA predictions will be

awful, because they've got a ridiculously low standard error (0.3 games, the Gaussian equivalent width of a uniform distribution).If memory serves, the FO predictions have a standard error of about ~2 wins. Which means that the FO predictions have an RMS error of about ~1.5 sigma, whereas the CoMA/Monkey predictions have an RMS error of ~10 and ~9 sigma respectively.

Fuzz out the CoMA/Monkey predictions by ~2 wins and their RMSE will be

largerthan FO's predictions, adding in quadrature.In any case, this isn't an incredibly complicated things. FO gives a distribution of expected wins, you've got a distribution of

resultingwins, and you're only comparing the first moment of each distribution.I am only snide to people who use the words "clearly" and "obviously" in their comments. It betrays their agenda.

No, it's not perfectly normal. Nothing really is in statistics.

If you want to credit FO for their distribution prediction, then we'd have to give the credit must go to the naive predictions too. But that begs the question. What is the point of a prediction that is distributed so large it is meaningless?

Ya know.. If you close your eyes and walk across a busy freeway in rush hour you have a pretty good shot at making it to the other side because traffic is likely to be at a complete stand still.. On the other hand...it might not be and your ears might be plugged up and you won't hear the whoosh of cars speeding by at 90 mph.

barawn isn't fooling anyone. you don't use the prediction's standard deviation (sigma). you would use the actual distribution's standard deviation! but that's not the point in the first place. no one cares how many standard deviations a prediction is off. what matters is the absolute error.

he either doesn't understand or is being deceitful.

And here, in a nutshell, is why passively managed investments are better than actively managed ones.

PS: Shouldn't you compare the predictions to teams' Pythagorean winning percentage?

The "experts" picks are as useless as as the teats on a boar hog.

Injuries, suspensions, weather, and many other factors have to be considered.

Vegas odds makers are far better suited to know

what will happen than retired "jocks" in neck ties

sitting behind desks pretending to know what will happen.

Dallas 0 and 2 Ask those who pick them to be super bowl bound (it could happen, BUT).

Tampa Bay 2 and 0, (they are hopeless) Yata, yata, yata. Watch out.

There are surprises to come, but don't bet on it.