Week 4
Communication channels
We can think of this like sending a signal through a noisy channel (similar to how we might think of radio waves or quantum states being affected by noise). The key components here are:
- A sender (Alice)
- A receiver (Bob)
- A noisy channel between them
The problem here is that Alice wants to send a message to Bob through this noisy channel. So how can we send that message, or any information, when we know the channel will corrupt some of it?
Discrete Memoryless Channel (DMC)
A DMC is a probabilistic transformation machine, similar to how we might think about quantum measurements, except simpler because it's classical.
It is discrete because the input and output alphabets are finite fields. The memoryless part comes from the idea that each input symbol transformed independently of any past events. It's like having multiple experiments where each one is independent of all the others. Therefore:
which means that all of the constituent probabilities can simply be multiplied together to get the total probability of the outcome. We call
We can break this down into parts:
- We have an input
that goes into the system. This is like preparing a quantum state. is the channel law, like a measurement operator. is the output we get out of the system, like the measurement results.
In the real world, this can be like throwing tennis balls through a tree. Some of the tennis balls will get through the tree unchanged, others will deviate slightly from their initial path, and a few will get completely blocked. However, each throw is completely independent of previous throws (memoryless, it doesn't "remember" any previous attempts), and there's only certain possible outcomes (discrete, not continuous).
Channel coding model
This is like creating an error correction system for measurements, but for classical information.
Let's say Alice has a message
Alice wants to reliably communicate this message to Bob. To do this, Alice and Bob design an encoder-decoder pair that satisfies some properties. The encoder
Remember that
, so is just , the length of the original message.
Formally, this can be written as:
or:
The decoder
The code comprises of the encoder-decoder pair
The average probability of error of a code
Let's break this equation down with a simple analogy. Imagine throwing a 100 tennis balls (message bits) through a foggy field (noisy channel). We want to know the probability that at least one ball goes missing or gets misidentified.
The first sum,
This is like considering all the possible initial configurations.
Next, we have
Finally, we have the third sum,
For example, let's say we're sending a 2-bit message
When designing this code, Alice and Bob want to choose an
Binary Erasure Channel (BEC)
This is a Discrete Memoryless Channel (DMC) with:
Here,
If we imagine sending bits (0s and 1s), sometimes a bit gets lost (erased) with probability
For any
- First, a random
generator matrix is chosen, where each element is randomly assigned to 0 or 1 (like flipping a coin). Formally: a random matrix over is chosen with each entry chosen i.i.d. . - Then, the generator matrix
is used to encode the message we are trying to send to generate the codeword . This can be written as . Alice then sends this codeword over a noisy channel to Bob. - Bob receives
, which is just with some bits erased ( ). Bob's decoder can then use the non-erased bits to solve linear equations over to try and retrieve . If there's a unique solution after this, then that's the message, but if there are multiple possible solutions, then the decoder declares an error.
There are two types of errors that can happen:
Too many erasures: If the channel erases more than
bits, then the decoder will declare an error. By the Chernoff bound, the probability for this to happen is exponentially small for large . This is like how in statistical mechanics, large systems are very unlikely to deviate too far from the average state. Ambiguous solution: This happens when the non-erased solutions don't give a unique answer. The probability of this happening is also exponentially small because the random matrix
is most likely "well-behaved". If we take the number of bits that got erased to be
, then we can create a new matrix that is just but with the rows corresponding to the erased bits removed, so a matrix. We then want to find the Rank of , and if it's equal to then there is a unique solution. Assuming that there are fewer than erasures, the probability that has a rank less than is exponentially small, so the probability of an ambiguous solution is also exponentially small.
At the same time, any code with rate
NOTE
I don't know what the inequalities are to prove this, they're somewhere in the (handwritten?) notes, so you'd have to check that for yourself.
Rank
The rank is the number of "truly independent directions" or "degrees of freedom" a matrix has. We can think of vectors in 3D space. If we have only one vector, then that's a rank of 1 (1 direction). If we have 2 non-parallel vectors, that's now 2 directions. 3 non-parallel vectors represent 3 directions, and so on. However, if we have 3 non-parallel vectors, but one is a combination of two others (think
Let's say we have a matrix:
This has a rank of 2, since there are 2 independent directions. This is like basis vectors
Another matrix:
has a rank of 1, since the second row is parallel to the first row, just twice the length.
Finally:
has a rank if 0, since there are no non-zero elements. It's like having no vectors.
Basically, we want to think about how many rows are independent and therefore how many independent equations we have, or how many columns are independent and therefore how many independent variables we can solve for.
Binary Symmetric Channel (BSC)
This is a Discrete Memoryless Channel (DMC) with:
This is a noisy measurement where bits can flip. Each bit has a
- First, we create a random codebook
of size , where each entry is a 1 or 0 with equal probability. Formally: A random codebook over is chosen with each entry chose i.i.d. . - Then, Alice's message
is encoded by selecting the -th row of the codebook to give . - Bob then receives
, which is just with some bits flipped. To decode this, we use , which is a function of , and look at all the sequences within Hamming distance of , which creates a "Hamming ball" . If this Hamming ball contains exactly one codeword that's also in then the decoder returns the message corresponding to that codeword. However, if there are no codewords, or multiple, then the decoder returns an error.
There are two types of errors that can happen:
- Too many bit-flips: If the channel flips more than
bits, then Bob's decoder will return an error since the codeword will not be in Bob's Hamming ball. By the Chernoff bound, the probability of this happening decreases exponentially with . - "Spoofing" codeword: There is an exponentially small chance that another valid codeword
appears close to the received signal, but doesn't decode into the correct message.
At the same time, any code with rate
NOTE
I don't know what the inequalities are to prove this, they're somewhere in the (handwritten?) notes, so you'd have to check that for yourself.
Channel capacity of general DMCs
Given channel law
It is the maximum rate of reliable communication. It's like finding the "speed limit" of information transfer. For BEC, the channel capacity is
We also have the optimal input
It is the input distribution that achieves the maximum channel capacity. It's like finding the optimal settings for an experiment. For BSC, it turns out that a uniform distribution is optimal, while for other channels this could vary.
NOTE
I was curious why the uniform distribution was optimal if it maximised entropy, which would in turn minimise the channel capacity, so I asked Claude about it. Here is its response, rewritten by me:
The channel capacity depends on the mutual information
Therefore, by maximising the entropy with an uniform distribution, we also maximise the channel capacity.
General DMC
This is the general case that all DMCs, like BEC and BSC, follow.
The rate
First, we create a random
codebook over the alphabet , where each row has type , the optimal input distribution. Then, we generate the codeword
by selecting the -th row of the codebook . This is similar to what we did in Binary Symmetric Channel (BSC). Alice then sends this codeword over the noisy communication channel. Bob receives it as
. We then look for the codeword in the codebook where the joint type is close to what we expect: where
is a function of . If there is a unique codeword found, then Bob's decoder outputs the message corresponding to that decoder. If no codeword is found, or multiple are found, then the decoder outputs an error.
There are two types of errors that can happen:
- Atypical noise: If the truly transmitted codeword
and the received signal have a joint type that does not satisfy the inequality, then the decoder will output an error. This basically means that what was received was too different from what we expected, and so we can't decode it. The probability of this happening decreases exponentially with increasing . - "Spoofing" codeword: The received signal
seems to match up with a different codeword than what was sent. It looks like something else could have been sent, which would decode into the wrong message. The probability of this happening is exponentially small.
At the same time, any code with rate