tag:blogger.com,1999:blog-38600807.post8140845454016303517..comments2018-06-02T14:19:34.554-04:00Comments on Advanced Football Analytics (formerly Advanced NFL Stats): The Learning CurveUnknownnoreply@blogger.comBlogger11125tag:blogger.com,1999:blog-38600807.post-86779914304994637152008-12-11T16:13:00.000-05:002008-12-11T16:13:00.000-05:00yeah it seems to me that the sample size is the bi...yeah it seems to me that the sample size is the biggest draw back to the whole thing, I think there are only 20-30 QBs in the whole thing, so I would guess that it only takes a couple of guys to skew the dataPhilhttps://www.blogger.com/profile/15837333926742421707noreply@blogger.comtag:blogger.com,1999:blog-38600807.post-39101258636291918902008-12-11T14:43:00.000-05:002008-12-11T14:43:00.000-05:00I've got the 2007 PFP, which has a follow-up artic...I've got the 2007 PFP, which has a follow-up article in it. That's where I got my understanding of the methodology in the research. The 2006 PFP has the original article.<BR/><BR/>Bootstrapping is dividing up the data into subsets, usually randomly, and then testing the relationships again. It's a way of gaining confidence in statistical inferences.<BR/><BR/>For example, in this study, you might divide up the data by draft years--'96, '97, etc.--and repeat the methodology for each year.<BR/><BR/>Say one of your variables was "QB's college conference" and you are testing if there is a connection between conference and pro performance. Naturally '98 will give the SEC a huge boost in significance because P. Manning went to Tenn. But any other year might prove to be insignificant for the SEC. We could conclude that the significant relationship observed between conference and pro performance is caused by a single individual. So the connection is really with the individual and conference is not that important.<BR/><BR/>What I'd do if I were FO is divide the data set in half. I'd use half the data to produce the regression equation they use as their forecast system. Then I'd test that forecast system against the 2nd half of the data. If it still predicts pro performance reasonably well, then I’d have confidence in the system.<BR/><BR/>If you just use the entire data set to produce the regression output, you’re likely to get an equation over-fit to non-repeating circumstances of the past, or to simple random luck. It might be retrodictively well-fit to past data (it forecasts the past), but not predictive at all. In the example, we’d know that just because Manning was from the SEC doesn’t mean future SEC QBs will be any better than those from the ACC or PAC 10.<BR/><BR/>The only reason I know this stuff is because I was guilty of perpetrating these errors myself!Brian Burkehttps://www.blogger.com/profile/12371470711365236987noreply@blogger.comtag:blogger.com,1999:blog-38600807.post-55491602534914924562008-12-11T14:11:00.000-05:002008-12-11T14:11:00.000-05:00what goes into 'bootstrapping subsets of the data'...what goes into 'bootstrapping subsets of the data'?<BR/><BR/>maybe the rest of your audience knows all this (I took a statistics class and econometrics class back in college), but I can't say I do (and for what its worth IMO you do a pretty good job of explaining alot of this stuff w/o being patronizing)<BR/><BR/>I think the article is in the Prospectus book that you have to buy (I don't know that there is an online version of it)Philhttps://www.blogger.com/profile/15837333926742421707noreply@blogger.comtag:blogger.com,1999:blog-38600807.post-44438814971108114372008-12-10T16:37:00.000-05:002008-12-10T16:37:00.000-05:00The difference isn't in the stats but in the infer...The difference isn't in the stats but in the <I>inference</I> we can make about the stats. It's about the confidence we can have in the the conclusions we draw. <BR/><BR/>When you test a single variable and it turns out significant, there is a 1 in 20 chance there really isn't a connection between the independent and dependent variables. (Forgive me if you know all this and I sound patronizing.) This would be "type I error." <BR/><BR/>When I data-mine a basket of 20 different variables for a correlation, and 1 of them appears significant, I really haven't learned much because I'd expect about 1 type I error anyway. Say 2 variables turn up significant. Which of them is truly significant? The answer could be none, one, or both. We don't know.<BR/><BR/>With a basket of 20 variables, you actually have as high as 1-(1-0.05)^20) = 0.64 probability of at least 1 type I error. In other words, instead of the normal 95% confidence, there is only 36% confidence any apparently significant variable is in fact significant.<BR/><BR/>I don't have strong feelings either way about the theory connecting college starts with NFL performance, but the methodology used in that article doesn't really establish the connection. It's worth looking at closer, though.<BR/><BR/>(I can't find the article if anyone has a link.)<BR/><BR/>There's nothing wrong with testing a bunch of variables for a correlation, but it should be only a start. We'd need to do a lot more leg work before we draw conclusions. Bootstrapping subsets of the data might be where I'd begin.Brian Burkehttps://www.blogger.com/profile/12371470711365236987noreply@blogger.comtag:blogger.com,1999:blog-38600807.post-39957478230333886282008-12-10T14:24:00.000-05:002008-12-10T14:24:00.000-05:00?????Isn't that the whole point of testing for sta...?????<BR/><BR/>Isn't that the whole point of testing for statisically significance?<BR/><BR/>why does it make any difference whether you do it twenty at a time or one at a time?Philhttps://www.blogger.com/profile/15837333926742421707noreply@blogger.comtag:blogger.com,1999:blog-38600807.post-69701055282234835952008-12-09T13:21:00.000-05:002008-12-09T13:21:00.000-05:00The PFR blog picks up the discussion of break-out ...The <A HREF="http://www.pro-football-reference.com/blog/?p=799" REL="nofollow">PFR blog</A> picks up the discussion of break-out rookie QB seasons.Brian Burkehttps://www.blogger.com/profile/12371470711365236987noreply@blogger.comtag:blogger.com,1999:blog-38600807.post-72766763566689053472008-12-09T13:13:00.000-05:002008-12-09T13:13:00.000-05:00Phil-Yes, I'm familiar with the theory. I think it...Phil-Yes, I'm familiar with the theory. I think it was a guest article on FO from a while back. <BR/><BR/>Intuitively, it would make a lot of sense and I'd buy it. QBs with many college starts are less likely to be one-year-wonder flash-in-the-pan types like Cal's Kyle Boller.<BR/><BR/>However, the methodology from that article was severely flawed. It basically took a couple dozen QB variables and threw them all into a blender to see which ones turned up significant as predictors for NFL performance.<BR/><BR/>The problem with that methodology is that, by definition, 1 out of 20 correlations is going to appear significant when it's really just a type I error. So when you test a truck-load full of variables you'd expect to see a couple "false-positive" significant correlations. And there's no way to tell the true correlations from the false-positives. <BR/><BR/>Further, there's a good chance that one of the variables really is significant but isn't detected due to a type II error. <BR/><BR/>The methodology is valid as a first-look analysis, but you can't really draw any conclusions from it. I should probably do a post on this.Brian Burkehttps://www.blogger.com/profile/12371470711365236987noreply@blogger.comtag:blogger.com,1999:blog-38600807.post-30692318417072558232008-12-09T13:02:00.000-05:002008-12-09T13:02:00.000-05:00Kiran-I'm guessing Rodgers isn't too far below Fav...Kiran-I'm guessing Rodgers isn't too far below Favre so far this year, but his season can't compare to Favre's from last year.<BR/><BR/>I think the Dolphin's lucked out the most. My system showed them as the strongest team in the AFC E for a while now. Although the Jets' have a better passing game this year, the bulk of their improvement has come primarily through the running game and defense.Brian Burkehttps://www.blogger.com/profile/12371470711365236987noreply@blogger.comtag:blogger.com,1999:blog-38600807.post-60130781858468664512008-12-09T00:33:00.000-05:002008-12-09T00:33:00.000-05:00Brian,Your excellent post prompted me to look anot...Brian,<BR/><BR/>Your excellent post prompted me to look another 3 quarterbacks, and compare their seasons to each other. Remember the Brett Favre soap opera in the offseason? <BR/><BR/>The result - Green Bay goes with untested Aaron Rodgers, the Jets trade a conditional draft pick (currently a third round, but maybe a second round, if the Jets make the playoffs) and give up on Chad Pennington for Favre, and the Dolphins pick up Pennington.<BR/><BR/>I looked at the seasons (through 13 games) for all three. Who do you think made the right decisions, the debatable (wrong?) decisions, and who got awfully lucky?<BR/><BR/>Your thoughts are appreciated.<BR/><BR/>KiranKiranRhttps://www.blogger.com/profile/16197332726638986050noreply@blogger.comtag:blogger.com,1999:blog-38600807.post-19545083358104601402008-12-08T13:57:00.000-05:002008-12-08T13:57:00.000-05:00are you familar with the football outsiders articl...are you familar with the football outsiders article a couple of years ago stating that the best two predictors of NFL success in college QBs are games started and completion percentage?<BR/><BR/>With that in mind I find myself wondering what the current slate of Big XII qbs are going to look like at the next level, especially when you consider not only have Daniels and Harrell logs 3 years of starting in college, but both logged multiple years in highschool playing in spread offenses, McCoy and Bradford if they stay in school will be able to log 4 years as a college starter,<BR/><BR/>all have logged completion percentages that put just about every current NFL starter's college completion percentages to shame<BR/><BR/>those four will have seen more live action attempts than just about anyone we have currently seen to datePhilhttps://www.blogger.com/profile/15837333926742421707noreply@blogger.comtag:blogger.com,1999:blog-38600807.post-70260618485306791052008-12-07T18:46:00.000-05:002008-12-07T18:46:00.000-05:00Brian,Very good analysis as usual. I don't know h...Brian,<BR/><BR/>Very good analysis as usual. I don't know how to attach tables and/or graphs to comments, so I have posted my comments to your post on my own blog - newqbrating.blogspot.com. I think you (and your readers) will find those comments interesting (if I should say so myself). Keep up the great work, and I hope to make future contributions to your thoughts.<BR/><BR/>Cheers,<BR/>KiranKiranRhttps://www.blogger.com/profile/16197332726638986050noreply@blogger.com