Information Theory

WARNING

Most of these notes are from Claude. Review them carefully after you're done.

NOTATION

All logarithms ( $\log$ ) and exponentials ( $\exp$ ) will be in base- $2$ (unless explicitly specified otherwise) since a "natural" unit of information is bits.
Sets will be denoted with calligraphic letters, e.g. $X$ .
Vectors will be denoted with underlines, e.g. $\underset{―}{x}$ .
Random variables will be denoted by capital letters, e.g. $X$ . So, combined with the previous convention, random vectors will be denoted by underlined capital letters, e.g. $\underset{―}{X}$ .
Notation such as $p_{X} (\cdot)$ references the probability mass function (p.m.f.) of random variable $X$ . So $p_{X} (i)$ equals the probability that the random variable $X$ takes the value $i$ .
For any vector $\underset{―}{v} \in R^{n}$ , the $ℓ_{1}$ norm $∥ \underset{―}{v} ∥_{1} = \sum_{i = 1}^{n} | v_{i} |$ .
We denote the length of a vector $\underset{―}{v}$ as $l e n (\underset{―}{v})$
$≜$ is for definitions. So $a ≜ b$ means $a$ is defined to be $b$ .
$≐$ is approximate. So $a ≐ b$ is the same as $a \approx b$ .
When given a vector, say $(X_{1}, . . ., X_{n})$ , if we want to reference only a (consecutive) subset of its components, say $X_{i}, X_{i + 1}, . . ., X_{j - 1}, X_{j}$ , we denote this as $X_{i}^{j}$ . By convention, $X^{n}$ should be taken as meaning $X_{1}^{n}$ .
"iff" means "if and only if"

Table of Contents:

Information Theory ​