Week 5
Concatenated Codes
Imagine we're trying to send a message across a noisy channel (like a radio signal through static). We've learned 2 types of codes:
- Random codes: These are theoretically excellent, but computationally impractical.
- Reed-Solomon codes: These are practical but only work well with large alphabets.
George David Forney Jr had a brilliant idea to combine these two approaches, like making a sandwich with 2 types of bread. Here's how it works:
- The first layer (outer code) takes our message and encodes it using Reed-Solomon codes. This is like wrapping the message in a protective layer.
- The second layer (inner code) takes the Reed-Solomon encoded message and breaks it into smaller chunks. Each chunk then gets encoded again using smaller random codes. This is like adding another protective layer, but in smaller, more manageable pieces.
The beauty of this approach is that the combination gives us good error correction (from Reed-Solomon), near-capacity performance (from random codes) and reasonable computational complexity (since we're using small random codes).
We can think of concatenated codes like a two-layer security system for our message. Let's say we're sending a package across the country. We want 2 levels of protection:
- Level 1 (outer code): We first put the items in secure boxes with special locks.
- Level 2 (inner code): Then we put those boxes in shipping containers with additional security.
Let's see how this works in practice. Say we have a 1000-bit message that we want to send.
- Step 1 (outer code – Reed-Solomon): Our 1000-bit message gets encoded using Reed-Solomon. This might turn it into, say, 1200 bits. These 1200 bits are then organised into blocks (like putting items into separate secure boxes).
- Step 2 (inner code – small random codes): Each block from step 1 gets encoded again, but instead of one huge random code (which would be impractical), we use many small random codes. Each small block may be, say, 100 bits long, and each block gets its own random encoding.
The beauty of this system is how it handles errors.
- Inner code level: Each small block might have some errors, and some blocks might be decoded incorrectly, but because the blocks are small, decoding is computationally feasible.
- Outer code level: The Reed-Solomon code can correct errors in entire blocks, so even if some of the inner blocks are decoded incorrectly, the outer code can fix it. It's like having insurance for your insurance.
A real world example would be something like sending a text message through a noisy channel. Let's say our original message is "HELLO_WORLD".
- Outer encoding (Reed-Solomon): This will add some protection, let's say it turns
HELLO_WORLDintoHELLO_WORLD_PROTECT. - Inner encoding (small random codes): This will break it into chunks (
HEL,LO_,WOR,LD_,PRO,TEC,T). Each chunk then gets its own random encoding.
If noise corrupts some parts, like let's say WOR gets corrupted, the inner code will try to decode each chunk, and even if the WOR chunk is decoded incorrectly, the Reed-Solomon will most likely fix it.
The advantages of this approach are that, by breaking the message into small chunks, we make the computational manageable, and we don't need a supercomputer to encode/decode. It gets fairly good performance, it gets close to the theoretical limit, and has strong error correction capabilities.
However, the trade-off we make by using this is that, to get really close to capacity, we need longer block lengths, more computational power and more sophisticated decoding algorithms.
Going into some of the maths, we start with some fundamental parameters:
- Rate:
where is small. The small represents how far we are from perfect efficiency. - Message length:
bits - Inner code block length:
- Number of inner code blocks:
The outer Reed-Solomon code operates over the field
For each inner code, the error probability
For the concatenated code to fail, one of two things must happen:
- Too many inner code errors:
. - Reed-Solomon decoder fails: this occurs if more than
inner codes fail, and the probability of this happening is .
For inner codes, the error probability decreases exponentially with
The overall rate
As
The total complexity is:
- Inner codes:
. - Outer code:
. - Overall:
.
This means that if we double the message length, the computation time increases by roughly a factor of 8, plus some logarithmic factors.
Rate Distortion Theory
This is about a fundamental question, "How much can we compress data if we're willing to accept some errors?" (Lossy compression).
Let's take a real-world example: We can think about compressing a photo to post online. Perfect reproduction would require too much data, so we accept some loss of quality to make the file size smaller. But how much compression is possible for a given quality level?
The Rate Distortion Theorem tells us that there's a fundamental limit to how much we can compress something while maintaining a certain quality level. This limit is called
Taking the example from before, let's say we have a 12 MP photo that we're trying to post. In perfect quality we might need, say, 36 MB (zero distortion). But if we reduce to quality to where it's still good, we may be able to get that down to 1MB (small distortion). If we keep pushing our compression to 100KB, our image quality will be poor (large distortion).
Looking at this mathematically, we start with a source
is the distortion measure between the original and its reconstruction. is the maximum allowed average distortion. is the minimum bits per symbol needed for distortion .
Given a source distribution
where the minimum is taken over all conditional distributions
We can use a binary source with Hamming distortion:
with . if , 0 if . for .
Achievability proof:
- Codebook generation: Generate
sequences according to , where each sequence has length . - Encoding: For input sequence
, find in the codebook, where for some small . - Probability of success: For
, the probability of finding good as , with error probability for some .
The basic trade-off here is that for a lower rate
For example, let's say we have a string of bits: 1101001. If
The practical implications of this come in real compression systems. In lossy image formats like JPEG, the trade-off is between file size and image quality, while in lossy audio formats like MP3, the trade-off is between file size and audio quality. Video codec have the same trade-off, between file size and video quality. Ω
Source-Channel Separation Theorem
This is a fundamental result about whether we should compress first, then add error protection (separate), or do both operations together (joint).
We can think of it packing and shipping a collection of books. There are two possible approaches here:
- Separate: First compress the books (source coding), then pack them securely for shipping (channel coding).
- Joint: Do both operations together in some clever way.
The surprising result is that doing it separately is just as good as any joint scheme.
Mathematically, for a source
Proof:
- Source coding: Compress the source to
bits per symbol. The error probability for this is . - Channel coding: Use capacity-achieving code. The error probability for this is
. - Overall system: The total error is
, and the rate required is .
Conversely, if
Proof: Using Fano's inequality:
combined with data processing:
leads to:
Let's try to explain this with a concrete example. We can imagine sending photos through a noisy channel. First, we compress the photo (like JPEG), then add error protection (like Reed-Solomon codes). This works because source coding gets rid of redundancy, while channel coding adds back controlled redundancy, and so the theorem says that this is optimal.
The practical implications of this are that we can design source coders (like JPEG) and channel coders (like WiFi codes) independently, and we don't need to worry about their interactions.
Key points:
- If our channel can handle more bits than our source needs (
), we can communicate reliably. - If our channel can't handle the required bits (
), no scheme will work reliably. - The separation approach achieves the best possible performance.
Important note: this theorem only works for point-to-point communication. In networks (like broadcasting to multiple receivers), joint source-channel coding can be better.
Kraft Inequality
The Kraft Inequality is about prefix codes – these are codes where no codeword is a prefix of another codeword. This property is super important because it lets us decode messages instantly, without waiting for more symbols.
Examples of prefix code:
A -> 0
B -> 10
C -> 110
D -> 111Not a prefix code:
A -> 0
B -> 00
C -> 001 (because '0' is a prefix of '00' and '001')The Kraft Inequality states that for any
For our example above with
The amazing thing is this inequality gives us a necessary condition for prefix codes to exist, a sufficient condition – if lengths satisfy this inequality then we can always construct a prefix code using these lengths, and a way to think about optimal code lengths.
Huffman codes
The basic idea of Huffman coding is to use shorter codewords for frequent symbols, and longer codewords for rarer symbols.
We can see how this works with an example. Let's say we have text with these frequencies:
A = 0.4 (most common)
B = 0.3
C = 0.2
D = 0.1 (least common)How Huffman coding works:
- Start with all the symbols and their probabilities.
- Find the two least probable symbols.
- Combine them into a new node with the sum of their probabilities.
- Repeat until we have one node.
Visually, it looks like this:
This gives us the codewords:
A: 0 (most frequent – shortest code)
B: 10
C: 110
D: 111 (least frequent – longest code)The beauty of Huffman codes is that they are prefix codes (satisfy Kraft Inequality), optimal (no other prefix code can do better) and easy to construct.
For example, if we encode "ABACABAD":
- Without Huffman:
(using fixed 2 bits per symbol) - With Huffman:
.