First, some terminology. P(A) means the "probability of event A," as in the probability it rains in Seattle tomorrow. Event A is 'it rains in Seattle tomorrow'. Likewise, we can define P(B) as the probability that it rains in Seattle today.
P(A|B) means "the probability of event A given event B occurs," as in the probability that it rains in Seattle tomorrow given that it rained there today. This is known as a conditional probability.
The probability it rains in Seattle today and tomorrow can be calculated by P(A|B) * P(B), which should be fairly intuitive. I hope I haven't lost anyone.
It's also intuitive that "raining in Seattle today and tomorrow" is equivalent to "raining in Seattle tomorrow and today." There's no difference at all between those two things, and so there's no difference in their probabilities.
We can write out that equivalence, like this:
P(A|B) * P(B) = P(B|A) * P(A)
Congratulations! You now understand Bayes' Theorem. Because that's all there is to it. We can do the tiniest bit of algebra, dividing each side by P(B) and rewrite the above as:
P(A|B) = P(B|A) * P(A) / P(B)
which is how Bayes' Theorem is commonly expressed.
Back to the draft. Let's say that fresh off his outstanding performance in the East-West Game, draft prospect Ozamataz Buckshank (RB-Stanford) is ranked on average as the 10th best player available, without respect to his position or team needs.
Prospects who are ranked 10th overall have historically been taken with exactly the 10th pick only 40% of the time. (It's actually much lower than that. All numbers in this post are for explanatory purposes only. The actual model uses the true numbers.) We'll call this the probability he is actually picked 10th, or P(Act). In Bayesian terms, this is a 'prior' probability, that is, the probability before we know anything else.
Now let's say a reputable draft analyst, who makes a living studying prospects and team needs, predicts Buckshank will be taken with the 11th pick. That's new information, and we'd like to incorporate it to know the probability Buckshank will be taken 10th given that he's been projected to be taken 11th. This can be written P(Act|Proj), or more specifically P(Act=10|Proj=11).
Applying Bayes' Theorem, we have:
P(Act|Proj) = P(Proj|Act) * P(Act) / P(Proj)
Bayesian inference is clever because it forces you to reason inductively (think backwards). Usually in statistics we'd be thinking in terms of predictor -> response. Bayes makes you think response -> predictor. In this case, we need to think in terms of P(Proj|Act), or the probability a prospect was projected to be taken 11th given that he was actually taken 10th. In other words, for all the recent #10 picks, in what proportion of the cases were they projected by an expert to be taken 11th.
Empirical analysis shows that historically the 10th pick in the draft was projected by the expert to be taken 11th 20% of the time. In Bayesian terms, this probability is known as the 'likelihood'.
As a recap, we want to know P(Act|Proj). We knew our prior P(Act), and we now know the likelihood P(Proj|Act). All that's left to find is P(Proj), and we will have realized the magic of Bayesian inference.
Ignoring P(B) for a moment, we can say P(Act|Proj) is proportional to P(Proj|Act) * P(A). Multiplying the likelihood and the prior gives us 0.40 * 0.20 = 0.08.
Because we're not only interested in the chance Buckshank is chosen 10th, but also the chance he is chosen across the full range of possible pick numbers, we'll repeat the process for the 9th pick, for the 11th pick, and so on. For the sake of this exercise, we'll say that our expert prognosticator is never wrong by more than 1 pick early and 2 picks late. Once we've calculated P(Proj|Act) * P(A) for each possible pick number, we get the following result.
Notice that the expert distribution is taller and narrower than the prior, which only considered overall "best-player ranking." The expert pick is more confident because it's been more accrurate in the past. It includes more information, like position, team need, and perhaps other things like team visits or workouts.
Also notice that our resulting distribution (gray) is well below the others. That's because we ignored P(Proj) earlier. Here's a neat trick: We still don't even need to calculate P(Proj) thanks to a convenient property of probability distributions: that they all must sum to 1. (Why? Because some outcome must happen--The dice of the universe don't rest on their edges.)
We can sum up all the probabilities for each pick number for our result (gray line), which comes to just about 0.25. Knowing that our posterior distribution must sum to 1, 0.25 must be P(Proj). We divide the probability for each pick by 0.25 and now the posterior distribution P(Act|Proj) nicely sums to 1. Here's what the final result looks like.
The yellow line represents the final result--P(Act|Proj)--the probability Ozamataz Buckshank will be selected at each pick number given that he was projected by the expert to be chosen 11.
Further, Bayesian inference allows us to use a posterior distribution as a new prior distribution for subsequent addition of new information. If we have multiple expert projections (which we do), we can repeat this process to produce a posterior that reflects all the information available. If the experts all concur, the final distribution will be tall and narrow, converging on a consensus pick number with high confidence. If the experts all disagree, the final distribution will be low and wide, spread across multiple pick numbers and indicating low confidence.
At this point, you might be wondering where the numbers for the prior and likelihood come from. I'll cover that in the next post. And the full results for all players in this year's draft will be available soon.