Prerequisites
Probability Basics
Probability is like quantum states, just like a particle can be in different states with different probabilities, we can have a random variable
This is called a binary random variable (like spin up/down).
A concrete example: Let
, the probability of getting . So the probability of is . If follows this distribution, we write: meaning
and .
This can be written as
Expected value
Just like expectation values in QM, we can calculate the average/expected values:
For example with
which means that, on average,
Independent events
When we have multiple random variables that don't affect each other, we call them independent and identically distributed (i.i.d.). When there is a statement like "Let
Probability mass function (p.m.f.)
In probability theory, the probability mass function (p.m.f.) formally describes the probability distribution of a discrete random variable. For a random variable
is the set of all possible values for the random variable . is the random variable itself. is a specific outcome or value from the set .
For example, if
The p.m.f. provides a complete description of the probabilities associated with the possible values of the random variable.
Probability laws
There are some fundamental probability laws we need to be familiar with:
- Axiom of Nonnegativity:
for all in . - Axiom of Unitarity:
, summing over all possible values of .
These ensure that the p.m.f. defines a valid probability distribution.
Independence
When we have multiple random variables that don't affect each other, we call them independent. For independent variables
This factorisation property is key for working with joint distributions.
Union bound
The union bound, also known as Boole's inequality, states that for any finite or countable set of events
In simpler terms, the probability of at least one event occurring is less than or equal to the sum of the probabilities of each event occurring individually. It's useful in information theory for error analysis, bounding failure probabilities in communication systems, analysing the probability of decoding errors and deriving achievability bounds. This bound is especially valuable because it's simple to apply, often gives useful results even though it's not always tight, and is valid regardless of whether the events are independent.
Markov's inequality
Markov's inequality states that for a non-negative random variable
where
This inequality is particularly useful because it provides an upper bound on the probability that a random variable takes values far from its mean. It also works for any non-negative random variable, regardless of its distribution, and it only requires knowledge of the expected value. For example, if we have a random variable with mean 10, the probability of it being
In information theory, Markov's inequality is often used to analyse convergence properties of random processes, prove concentration inequalities, derive bounds on error probabilities in coding theorems and study the behavior of information measures.