tag:blogger.com,1999:blog-38600807.post4818871943868008150..comments2023-11-05T04:16:44.937-05:00Comments on Advanced Football Analytics (formerly Advanced NFL Stats): Signal vs. Noise in Football StatsUnknownnoreply@blogger.comBlogger9125tag:blogger.com,1999:blog-38600807.post-87429813766528774352009-12-29T06:10:40.457-05:002009-12-29T06:10:40.457-05:00Do you think correlating first 8 games with last 8...Do you think correlating first 8 games with last 8 games is the best way? Changes in weather and playoff considerations (among other things) could affect the various efficiency. To truly test the correlation of the various variables with each other you should be randomizing your two sets by choosing 8 random games for the first variable and the remaining 8 games for the second variable. And ideally do that over the 16 choose 8 possible groupings (or a statistically large enough sample of those 12870 combinations.)Rogerhttp://google.comnoreply@blogger.comtag:blogger.com,1999:blog-38600807.post-82862253255631140962008-11-23T23:01:00.000-05:002008-11-23T23:01:00.000-05:00In logit regression, the units are weird. It's "ch...In logit regression, the units are weird. It's "change in the logarithm of the odds ratio per unit of independent variable." But effectively, it's the same. The regression is measuring normalized variance, so units are transparent.<BR/><BR/>No, there is no official cut-off for correlation. I'd use it as a guide, though. I'd start with everything that you believe has a logical cause-effect relationship with your dependent variable, as long as everything is reasonable independent. For example, don't use penalty minutes and penalties in your model, because they're both measuring the same thing.<BR/><BR/>Run the regression and note how well it predicts results. You can then remove or add variables and see how well it improves or hurts the model.Brian Burkehttps://www.blogger.com/profile/12371470711365236987noreply@blogger.comtag:blogger.com,1999:blog-38600807.post-14688286374807077692008-11-23T22:47:00.000-05:002008-11-23T22:47:00.000-05:00brian thanks for getting back so quick!..this help...brian thanks for getting back so quick!..this helps a couple follow-ups ...<BR/><BR/>"The units aren't important. In a linear regression the generated coefficients are always in "units of dep variable per unit of indep variable." So the ultimate prediction from the model will always be simply in terms of the dependent variable."<BR/><BR/>is the same true for logit regression?<BR/>does the specific rate you used for each of<BR/>your stats matter i.e opass is yards per play penalty rate I'm assuming is penalty yards? per game<BR/><BR/>A theory question...if we were to find another stat that fit your criteria above (independent, correlated to winning etc) at say .25 or .35 would it neccessarily improve your model?? are correlations cumulative in this way..in other words if we hypothetically found another magical stat that fit your criteria above<BR/>( was independent etc etc) would it neccessarily improve your predictions?<BR/>One last thing..<BR/>I am building a model for another sport ( ice hockey) and wanted to know your suggestion for cutoff values for inclusion in a model..what would you consider a strong enough correlation value for a stat to be considered skill vs random luck (I know .08 is random) but is (.25)strong enough to be consideredrepeatable and skill? as you know ice hockey has much more randomness in it which i'm struggling to work through)<BR/>DanMr.Ceraldihttps://www.blogger.com/profile/16527141701099632659noreply@blogger.comtag:blogger.com,1999:blog-38600807.post-44878946942714837892008-11-23T22:13:00.000-05:002008-11-23T22:13:00.000-05:00,I picked them based on 3 considerations--1. Each ...,I picked them based on 3 considerations--<BR/><BR/>1. Each variable had to be independent from the others (or "orthogonal" as statisticians say). That is, one variable couldn't be correlated with the others.<BR/>2. Each variable had to be predictive of wins. That means it has to (a) correlate well with winning.<BR/>3. And (b) the direction of causation had to be clear. Total rushing yards, for example, correlate highly with winning, but it's teams that are already ahead that rack up extra rushing yards in the 4th qtr.<BR/><BR/>Special teams stats were not significant.<BR/><BR/>This <A HREF="http://www.advancednflstats.com/2007/10/game-model-coefficients.html" REL="nofollow">post</A> shows the logit coefficients with def ints left in.Brian Burkehttps://www.blogger.com/profile/12371470711365236987noreply@blogger.comtag:blogger.com,1999:blog-38600807.post-2410321888577288742008-11-23T21:30:00.000-05:002008-11-23T21:30:00.000-05:00Brian one further question your model is based on ...Brian one further question <BR/>your model is based on these rates<BR/>Offensive pass efficiency, including sack yardage<BR/>Defensive pass efficency, including sack yardage<BR/> "Offensive run efficiency<BR/>Defensive run efficiency<BR/>Offensive interception rate<BR/>Defensive interception rate<BR/>Offensive fumble rate<BR/>Penalty rate (penalty yards per play)"<BR/><BR/>how did you select these? did you base it on<BR/>correlations of each stat to team wins ? Or are these the stats with the higest % from the non-linear<BR/>logit regression? just curiuos def int rate <BR/>was left out due to its randomness what did the logit regression show for it?<BR/>thx again danMr.Ceraldihttps://www.blogger.com/profile/16527141701099632659noreply@blogger.comtag:blogger.com,1999:blog-38600807.post-88611381572803634322008-11-23T21:25:00.000-05:002008-11-23T21:25:00.000-05:00Dan-I don't have the exact method handy, but what ...Dan-I don't have the exact method handy, but what I did was do a linear regression with 3rd down % from the 2nd half of the season as the dependent variable, and pass efficiency, sack rate, and interception rate from the 1st half of the season as independent variables. This regression actually predicts 2nd half-season 3rd down % better than 1st half season 3rd down %.<BR/><BR/>The units aren't important. In a linear regression the generated coefficients are always in "units of dep variable per unit of indep variable." So the ultimate prediction from the model will always be simply in terms of the dependent variable.Brian Burkehttps://www.blogger.com/profile/12371470711365236987noreply@blogger.comtag:blogger.com,1999:blog-38600807.post-86250145184659088982008-11-23T21:08:00.000-05:002008-11-23T21:08:00.000-05:00Hi Brian...this is a great post (even when looking...Hi Brian...this is a great post (even when looking at it for a second time)..<BR/>I have a stat calculation question?<BR/>how do you combine two stats together?<BR/>that seem to have different units and keep the same ratios? when I try it <BR/>it messes up..<BR/><BR/>example "offensive 3rd down percentage could be predicted using passing efficiency, sack rate, and interception rate"<BR/>would you mind walking through one example for<BR/>a novice statitiscian..thanks danMr.Ceraldihttps://www.blogger.com/profile/16527141701099632659noreply@blogger.comtag:blogger.com,1999:blog-38600807.post-88614006918189009432008-03-31T11:48:00.000-04:002008-03-31T11:48:00.000-04:002006 and 2007 (n=64).2006 and 2007 (n=64).Brian Burkehttps://www.blogger.com/profile/12371470711365236987noreply@blogger.comtag:blogger.com,1999:blog-38600807.post-22891756947598767122008-03-31T11:38:00.000-04:002008-03-31T11:38:00.000-04:00Over how many seasons were these correlations run?...Over how many seasons were these correlations run? <BR/><BR/>re:Lions ints<BR/><BR/>The drop in the 2nd half of 2007 was likely caused by the collapse of Big Baby, Shaun Rogers.SportsGuyhttps://www.blogger.com/profile/02900787022759289513noreply@blogger.com