Ch1 - Basic Probability
Outcomes:
- Introduction
- Discrete Probability
- Statistical independence
- Conditional independence
- Example: Signature verification
- Probability Densities
- Expectations and Covariances
Discrete Probabilities
In order to describe a probability distribution we need to specify a triple , where is the discrete random variable. The set of possible values, outcomes or realizations of , is denoted by , and the relative frequencies with which we observe the different outcomes of , are given by . Here is the frequency or probability with which the outcome is observed, and we write This says that the probability that takes on the value , is equal to . Assuming that takes on one and only one of the values in , it is necessary that, \sum_{a_i\in\mathcal{A}_X}P(X = a_i) = 1, \tag{1.1}
The two conditions, , and , ensure that the are proper probabilities (probabilities are always greater than 0 and sums to 1).
we always assume that the values are mutually exclusive. That is, we assume that can take only one of the values in during a specific experiment.
Instead of , the shorthand will often be used
Make distinction between between a random variable , written in upper case, and its values or realization, or , written in lower case.
If the sum in is only over a subset of , i.e. , P(T) = P(X \in T) = \sum_{a\in T}P(a) \tag{1.2} This is the first form of the sum rule.
sum rule - Example
If one throws two six-sided, balanced dice, what is the probability that at least one 3 will show?
Noting that there are possible combinations, with each combination being equally likely, one has to add the probabilities of all the combinations that contain a 3. Since there are 11 possible combinations, each with a probability of , the total probability is
A joint ensemble is an ensemble in which each outcome is an ordered pair, with and . We call the joint probability of and . More intuitively, is the probability of observing the values and simultaneously. For to be a proper probability, it has to be normalized so that \sum_{x \in \mathcal{A_X}}\sum_{y \in \mathcal{A}_Y} P(x, y) = 1 \tag{1.3} where is the same as
Now, let us consider a new function obtained from the joint distribution, P(X) = \sum_{y \in \mathcal{A}_Y} P(X, y) \tag{1.4}
Key concepts include:
-
Sum Rule: The probability of a subset T of A_X is given by summing the probabilities of its elements.
-
Joint Probability: The probability of two discrete variables X and Y occurring together is represented as P(X, Y) , which sums to 1.
-
Marginal Probability: Derived from joint probability by summing over one variable.
-
Conditional Probability: Defines the probability of X given Y , expressed as P(X | Y) = P(X, Y) / P(Y) , if P(Y) > 0 .
-
Product Rule: The joint probability can be rewritten as P(X, Y) = P(X | Y) P(Y) , leading to Bayes’ Theorem.
Statistical and Conditional Independence:
• Statistical Independence: P(X, Y) = P(X) P(Y) , meaning knowledge of one variable does not affect the probability of the other.
• Conditional Independence: If P(X | Y, Z) = P(X | Z) , knowing Z removes any additional information Y might provide about X .
Examples:
-
Dice Roll: The probability of rolling at least one ‘3’ with two dice is computed using the sum rule.
-
Genome Example: Your genome depends on your grandparents’ genomes, but given your parents’ genomes, your genome is conditionally independent of your grandparents’.
-
Signature Verification: A system determines the authenticity of signatures based on probability models, demonstrating how probability helps in decision-making despite occasional errors.
The core idea is that joint probability is fundamental, and all other probability concepts (marginals, conditionals, and independence) derive from it.