Since the early 1970s, Jeff Sagarin has been publishing sports team ratings. For most sports, including the NFL, his ratings are calculated so that the difference between two opponent's ratings, plus a home field adjustment, forecast a game's point spread. His ratings are widely recognized as some of the best around. They can be found every week of the NFL season on the USA Today site. Sagarin has never published his exact algorithms, but we can easily build a very good facsimile.
Excel has a powerful tool called "Solver." It's one of those thousand or so features that Microsoft packs into its Office products that no one ever knows about. In fact, you do don't even see it on the Tools menu until you enable it from the "Tools|Add Ins..." command.
If you go to Microsoft's on-line help site for Solver, the example problem provided is an exercise estimating point spreads for NFL games. The sample spreadsheet is for all the game scores from the 2002 season.
Basically all you do is create a table of ratings for each team. The ratings don't have to mean anything yet. For now they can be your best guess, or all ones, or anything. Solver will calculate them later. Then for each game, you calculate what the ratings suggest should be the point spread. The ratings are intended to work just like Jeff Sagarin's ratings. If team A's rating is 5 and team B's rating is 8, then when team A plays team B the point spread should be 3 in favor of team B. Factoring in a league-wide value for home field advantage, say 3, and the spread becomes 6 if team B is at home.
Next, using the LOOKUP function to grab the ratings from the table, you calculate the error between the expected spread and the actual result for each game. Square the error (as every good statistician would). In a cell, sum all the squared errors for all the games in the season. In another cell, enter a point value for home field advantage--3 points is a good initial guess. Solver takes over from here.
In the Solver dialog box, you tell it to minimize the value in the cell for the sum of squared errors. Then you tell it to do so by varying the values in the table for the team ratings and the cell for the home field advantage. (You can also add in a constraint that says the average for all the teams' ratings should be zero, so that good teams will have positive ratings and poor teams will have negative ratings.)
Solver will compute the team ratings necessary to best fit the actual point spreads. And now you have your very own homemade Sagarin ratings.
I noticed that Sagarin's average rating is 20 instead of 0, which makes sense because the average NFL score is about 20 points. So I altered the Solver constraint accordingly. For the 2007 season, including the playoffs, the homemade ratings were nearly identical to Sagarin's.
Comparison of Sagarin and MS Solver Team Ratings for 2007
Team | Sagarin | MS Solver |
NE | 36.4 | 37.7 |
IND | 30.2 | 30.6 |
SD | 30.0 | 28.8 |
GB | 29.4 | 28.9 |
NYG | 28.3 | 25.2 |
DAL | 28.0 | 28.2 |
JAX | 27.9 | 26.1 |
PHI | 24.9 | 24.8 |
PIT | 23.6 | 24.6 |
MIN | 22.8 | 23.6 |
WAS | 22.2 | 22.7 |
SEA | 22.0 | 22.0 |
TEN | 21.8 | 20.2 |
CHI | 20.5 | 21.2 |
HOU | 20.3 | 19.7 |
TB | 19.8 | 20.5 |
CLE | 19.4 | 18.8 |
DEN | 17.5 | 15.8 |
DET | 17.2 | 16.7 |
BUF | 16.4 | 20.0 |
CIN | 16.2 | 17.6 |
NO | 16.1 | 17.4 |
ARI | 16.0 | 16.2 |
OAK | 15.7 | 14.3 |
NYJ | 15.5 | 16.1 |
KC | 15.2 | 15.1 |
CAR | 15.0 | 14.3 |
MIA | 13.2 | 11.7 |
BAL | 13.0 | 14.6 |
ATL | 10.2 | 9.9 |
SF | 8.3 | 8.9 |
STL | 7.1 | 7.8 |
The differences beween Sagarin's and our homemade ratings may come from the method of solving. Solver uses a "brute force" numerical iteration method, and Sagarin's method is unknown. Sagarin may also weight recent games heavier. Notice how the difference in the Giants' rating is one of the more significant. The Giants finished the 2007 season on quite a win streak.
Doug Drinen of Pro-Football-Reference.com discusses a very similar method for ranking teams based on margin of victory which he calls the Simple Rating System (SRS). His post includes a good discussion on the advantages and disadvantages of a pure margin of victory ranking system.
Sagarin actually uses two different systems. One is called Pure Points, which is based solely on point differential. This is the system which the method discussed above mimics. His other method is called Elo Chess, which considers only wins and losses, and ignores points. This system is based on a method devised to rate chess players by Arpad Elo, a physics professor and master chess player. In the next post I'll demonstrate how to mimic the Elo ratings.
I used to use EloSTAT or Bayesian Elo (allows you to offset home field advantage) for Elo calculations. Create a pgn file with game results (1-0 or 0-1), and run it through the program.
Alternatively I had an idea about using strength of schedule and win % to do something similar.
Win % x Strength of Schedule x 2
50% x 50% (average opp) x 2 = Rating of 50%
60% x 60% (average opp) x 2 = 72%
What do you think of that method?
It looks like you have SF and SEA mixed up on your ratings.
Fixed. Thanks.
Not familiar with EloSTAT, but sounds interesting. I'm finishing putting together my post on Elo now.
I think the method you propose would be a great approximation, but what does the final rating mean?
I'd suggest this:
Take the odds ratio of each team's win% and each team's SoS (avg opp win%). So a .500 team would be 1:1 and a .600 team would be 3:2.
Then take the natural log of the odds ratios and add them together. Now you have the natural log of the odds ratio of a team's "true" win% accounting for opponent strength.
Solve for the odds ratio by e^(log odds ratio). Then solve for the winning percentage for the resulting odds ratio which would be =1/(1+(1/x)), where x is the odds ratio.
Example:
A team with a .600 win% and a .400 avg opponent win%. The odds ratios are 3:2 and 2:3.
3:2=1.5 and 2:3=.67
ln(1.5) = .405
ln(.67) = -.405
.405 + (-.405) = 0
e^0 = 1
1/(1+1/1) = 0.500
The .600 team with a .400 strength of schedule is really a .500 team.
Your method would give .6 * .4 * 2 = .48, which is a really good approximation.
Ya Elo is supposed to give you the probability that A can beat B. The whole premise behind it has been the assumption that if A > B > C then A > C. Which means teams don't have to play each other directly to get an inference of relative strength.
Elo programs automate the calculations performances in chess tournaments.
As far as my method, the thought was that it would show that performance against a 50% opponent. However, I did not do any testing to see if that was the case.
Expected Result = 1/(1+10^(Elo Difference/400)) - http://remi.coulom.free.fr/Bayesian-Elo/
On the calculations, I don't think I understand something.
.6 record against .5 SOS.
.6/.5 = 1.2:1
ln(1.2)&(0.833)= 0.1823+ -0.1823 = 0
e^(0) = 1
1/(1+1/1) = 0.500
Did I make a mistake in my calculations? I increased the SOS, but the final result was still 0.5
Sorry, I wasn't very clear and picked a bad example to illustrate. Here's how it would work for your example above.
A .6 record would be 3:2 or 1.5 odds ratio. A .5 SoS would be a 1:1 or 1.0 odds ratio.
ln(1.5) + ln(1.0) = 0.405 + 0 = 0.405
e^(.405) = 1.5
1/(1+(1/1.5)) = .6
So a .6 win% against a .5 SoS yields a .6 "true" win%.
Elo's algorithm works in much the same way. It uses a base 10 log instead of base e, and Elo is based on an (arbitrary) average rating of 1500 with a SD of 200.
(By the way, I just realized that instead of ln(x) + ln(y), you can just do ln(x*y). They're the same. 8th grade is slowly coming back to me!)
I'm kind of wondering if something like this would be a bit more accurate estimation:
X = Record % - .5
Y = SOS % - .5
Z = X + Y + .5
Playing with the numbers it seems to be a decent estimation as well.
When you did the correlation analysis of the Homemade ratings to the Sagarin ratings, did you use Sagarin's combined rating (Pure Points & EloChess) or just one of them. In picking games against my friend, I've been using Pure Points for the last few years.
For this, I compared with pure points. For the follow-on Elo post, I compared with EloChess.
I noticed that the ratings Sagarin has at the end of season are not the same ones he has starting the next season. Any idea why they change and how he gets the numbers to start the new season.
I plan to use a random number generator and Sagarin ratings and/or RPI ratings for a March Madness Bracket. My only issue is this: How can I use the ratings to come up with an estimated winning percentage to plug into the random number generator? I am statistically inclined but this element has always escaped me.
Sagarin's ratings should estimate the point spread pretty well. You can use the Pythagorean formula with an exponent of 10 to estimate a win probability. You also need an estimate of an average score, which I'm not sure where to find.
Win probability=normsdist(Spread/12). The Spread/12 generates a t-score, and the normsdist function measures the area under the curve. A 1-point favorite should win about 53.33%. For the favorite Spread is a positive number. It's negative for an underdog. Normsdist(Spread/12)+Normsdist(-spread/12)=1.
If you look at Saigon's 2010 Dec 19's pure point ratings you will see that Ten is rated below Chi. Yet, Ten had a +26 DIFF while Chi had a +25 DIFF and Ten had a harder schedule. Therefore it appears that Saigon does weigh more recent games more heavily (Ten was on a big losing streak w/o Vince Young).
Just implemented this with the ols function of PDL::Stats (a Perl module). Not the same as Microsoft Excel, I know, but linear least squares is pretty stable. However, occasionally this method just blows up. SRS has its moments as well (as in the matrices that result are so often singular).
According to Wayne Winston's "Mathletics", Sagarin uses a "proprietary weighted least squares algorithm." That would probably explain a decent amount of the difference between your results and his.
One cautionary tidbit for Solver ... I was trying to use Solver to analyze college offenses and defenses, and I gave Solver a lot of leeway with my ratings. I ended up with Wisconsin having the most prolific passing offense in the Big Ten (and, conversely, one of the worst rushing offenses).
has anyone come up with a better way to use solver to come up with better ratings
Has anybody run past performances through a regression model? Is elo for sports anywhere as accurate as for chess?
That excel model came from the work of Wayne Winston, who is very good friends with Sagarian, I would say that model is very close to what Sagarian uses.
It would be interesting to pit Excel vs. Sagarin weekly, to see how much improvement you get from Sagarin's proprietary tweaks.