Bayes' Theorem, also known as Bayes' Rule, is used in statistics and probability theory to relate marginal probabilities and conditional probabilities. In the context of Bayesian probability theory, it is used to update degrees of belief (probabilities) given new information. The theorem is attributed to the Reverend Thomas Bayes (1702-1761) an English nonconformist minister, but it is likely to have been known to earlier mathematicians such as Pierre Laplace.
Bayes' theorem is stated mathematically as
where X and Y are independent statements, I is available background information, and
P(X | Y,I) is the posterior probability for X given Y and I,
P(Y | X,I) is the likelihood for Y given X and I,
P(X | I) is the prior probability for X given only I, and
P(Y | I) is sometimes called the evidence or probability for Y given only I.
Applications and Examples
Another example of how Bayes' theorem would be used is:
Suppose a particular disease afflicts 1% of the population. Suppose that a test for the disease is 95% accurate. Suppose that someone tests positive for the disease but there is no other evidence that they have the disease. What is the probability that they have the disease?
Let X be the event that the test result is positive. Let Y be the event that the person actually has the disease.
Before the test result is known our probability for the person having the disease is p(Y) = 1% = 0.01. The probability that the person has a positive test result, giving that they have the disease, is 95% = 0.95. The denominator term, p(X) is a little more complex since X can occur in two different ways. If the person has the disease, they will test positive with probability 0.95. If the person does not have the disease they will test positive with probability 5%. We denote the probability that Y is not true by Y'. The laws of probability require that Y' = 1 - Y. Thus
In other words, there is only a 16% chance that a person testing positive actually has the disease.
An extended form of Bayes's theorem is obtained by noting that it applies to probability distributions as well as to events. Let y be a (vector valued) observable quantity that we want to use to estimate some unknown, unobservable (vector valued) quantity θ. Prior to seeing the data y, we summarize our knowledge about θ by a probability distribution p(θ). Assume that we have a model of the relationship between y and θ. Call this p(y | θ). We can use Bayes' theorem to update our knowledge of θ by incorporating the information contained in the observed data y.