Ch1 - Basic Probability

Outcomes:

Introduction
Discrete Probability
1. Statistical independence
2. Conditional independence
3. Example: Signature verification
Probability Densities
Expectations and Covariances

Discrete Probabilities

In order to describe a probability distribution we need to specify a triple $(X, A_{X}, P_{X})$ , where $X$ is the discrete random variable. The set of possible values, outcomes or realizations of $X$ , is denoted by $A_{X} = {a_{1}, \dots, a_{I}}$ , and the relative frequencies with which we observe the different outcomes of $X$ , are given by $P_{X} = {p_{1}, \dots, p_{I}}$ . Here $p_{i}$ is the frequency or probability with which the outcome $a_{i}$ is observed, and we write $P (X = a_{i}) = p_{i}, p_{i} \geq 0.$ This says that the probability that $X$ takes on the value $a_{i}$ , is equal to $p_{i}$ . Assuming that $X$ takes on one and only one of the values in $A_{X}$ , it is necessary that, $\sum_{a_i\in\mathcal{A}_X}P(X = a_i) = 1, \tag{1.1}$

The two conditions, $p_{i} \geq 0$ , and $\sum_{i = 1}^{I} p_{i} = 1$ , ensure that the $p_{i}$ are proper probabilities (probabilities are always greater than 0 and sums to 1).

$N o t e :$ we always assume that the values are mutually exclusive. That is, we assume that $X$ can take only one of the values in $A_{X}$ during a specific experiment.

Instead of $P (X = a_{i})$ , the shorthand $P (a_{i})$ will often be used

Make distinction between between a random variable $X$ , written in upper case, and its values or realization, $x$ or $a_{i}$ , written in lower case.

If the sum in $(1.1)$ is only over a subset $T$ of $A_{X}$ , i.e. $T \subseteq A_{X}$ , $P(T) = P(X \in T) = \sum_{a\in T}P(a) \tag{1.2}$ This is the first form of the sum rule.

sum rule - Example

If one throws two six-sided, balanced dice, what is the probability that at least one 3 will show?

Noting that there are $6 \times 6 = 36$ possible combinations, with each combination being equally likely, one has to add the probabilities of all the combinations that contain a 3. Since there are 11 possible combinations, each with a probability of $\frac{1}{36}$ , the total probability is $\frac{1}{36} + ... + \frac{1}{36} = 11 \times \frac{1}{36} = \frac{11}{36}$

A joint ensemble $X Y$ is an ensemble in which each outcome is an ordered pair, $(x, y)$ with $x \in A_{X}$ and $x \in A_{Y}$ . We call $P (X, Y)$ the joint probability of $X$ and $Y$ . More intuitively, $P (X = x, Y = y)$ is the probability of observing the values $x$ and $y$ simultaneously. For $P (X, Y)$ to be a proper probability, it has to be normalized so that $\sum_{x \in \mathcal{A_X}}\sum_{y \in \mathcal{A}_Y} P(x, y) = 1 \tag{1.3}$ where $P (x, y)$ is the same as $P (X = x, Y = y)$

Now, let us consider a new function obtained from the joint distribution, $P(X) = \sum_{y \in \mathcal{A}_Y} P(X, y) \tag{1.4}$

Key concepts include:

Sum Rule: The probability of a subset T of A_X is given by summing the probabilities of its elements.
Joint Probability: The probability of two discrete variables X and Y occurring together is represented as P(X, Y) , which sums to 1.
Marginal Probability: Derived from joint probability by summing over one variable.
Conditional Probability: Defines the probability of X given Y , expressed as P(X | Y) = P(X, Y) / P(Y) , if P(Y) > 0 .
Product Rule: The joint probability can be rewritten as P(X, Y) = P(X | Y) P(Y) , leading to Bayes’ Theorem.

Statistical and Conditional Independence:

• Statistical Independence: P(X, Y) = P(X) P(Y) , meaning knowledge of one variable does not affect the probability of the other.

• Conditional Independence: If P(X | Y, Z) = P(X | Z) , knowing Z removes any additional information Y might provide about X .

Examples:

Dice Roll: The probability of rolling at least one ‘3’ with two dice is computed using the sum rule.
Genome Example: Your genome depends on your grandparents’ genomes, but given your parents’ genomes, your genome is conditionally independent of your grandparents’.
Signature Verification: A system determines the authenticity of signatures based on probability models, demonstrating how probability helps in decision-making despite occasional errors.

The core idea is that joint probability is fundamental, and all other probability concepts (marginals, conditionals, and independence) derive from it.

Jaret's Wiki

Explorer

Ch1 - Basic Probability