Ch1 - Basic Probability

Outcomes:

  1. Introduction
  2. Discrete Probability
    1. Statistical independence
    2. Conditional independence
    3. Example: Signature verification
  3. Probability Densities
  4. Expectations and Covariances

Discrete Probabilities

In order to describe a probability distribution we need to specify a triple , where is the discrete random variable. The set of possible values, outcomes or realizations of , is denoted by , and the relative frequencies with which we observe the different outcomes of , are given by . Here is the frequency or probability with which the outcome is observed, and we write This says that the probability that takes on the value , is equal to . Assuming that takes on one and only one of the values in , it is necessary that, \sum_{a_i\in\mathcal{A}_X}P(X = a_i) = 1, \tag{1.1}

The two conditions, , and , ensure that the are proper probabilities (probabilities are always greater than 0 and sums to 1).

we always assume that the values are mutually exclusive. That is, we assume that can take only one of the values in during a specific experiment.

Instead of , the shorthand will often be used

Make distinction between between a random variable , written in upper case, and its values or realization, or , written in lower case.

If the sum in is only over a subset of , i.e. , P(T) = P(X \in T) = \sum_{a\in T}P(a) \tag{1.2} This is the first form of the sum rule.

sum rule - Example

If one throws two six-sided, balanced dice, what is the probability that at least one 3 will show?

Noting that there are possible combinations, with each combination being equally likely, one has to add the probabilities of all the combinations that contain a 3. Since there are 11 possible combinations, each with a probability of , the total probability is

A joint ensemble is an ensemble in which each outcome is an ordered pair, with and . We call the joint probability of and . More intuitively, is the probability of observing the values and simultaneously. For to be a proper probability, it has to be normalized so that \sum_{x \in \mathcal{A_X}}\sum_{y \in \mathcal{A}_Y} P(x, y) = 1 \tag{1.3} where is the same as

Now, let us consider a new function obtained from the joint distribution, P(X) = \sum_{y \in \mathcal{A}_Y} P(X, y) \tag{1.4}

Key concepts include:

  1. Sum Rule: The probability of a subset  T  of  A_X  is given by summing the probabilities of its elements.

  2. Joint Probability: The probability of two discrete variables  X  and  Y  occurring together is represented as  P(X, Y) , which sums to 1.

  3. Marginal Probability: Derived from joint probability by summing over one variable.

  4. Conditional Probability: Defines the probability of  X  given  Y , expressed as  P(X | Y) = P(X, Y) / P(Y) , if  P(Y) > 0 .

  5. Product Rule: The joint probability can be rewritten as  P(X, Y) = P(X | Y) P(Y) , leading to Bayes’ Theorem.

Statistical and Conditional Independence:

Statistical Independence:  P(X, Y) = P(X) P(Y) , meaning knowledge of one variable does not affect the probability of the other.

Conditional Independence: If  P(X | Y, Z) = P(X | Z) , knowing  Z  removes any additional information  Y  might provide about  X .

Examples:

  1. Dice Roll: The probability of rolling at least one ‘3’ with two dice is computed using the sum rule.

  2. Genome Example: Your genome depends on your grandparents’ genomes, but given your parents’ genomes, your genome is conditionally independent of your grandparents’.

  3. Signature Verification: A system determines the authenticity of signatures based on probability models, demonstrating how probability helps in decision-making despite occasional errors.

The core idea is that joint probability is fundamental, and all other probability concepts (marginals, conditionals, and independence) derive from it.