Skip to content

Week 3

Fields

A field F is a set, like number systems like the set of real numbers R or the set of complex numbers C, where we can:

  • Add or multiply numbers.
  • Find opposites, like -5 is the opposite of 5.
  • Find reciprocals, like 15 is the reciprocal of 5.
  • And these operations behave in "nice" ways, like we're used to.

These fields should follow some properties:

  • Closure: For any a and b in the field F, the result of adding or multiplying these elements should also be in F. For instance, multiplying 3.2×4.5 (both in R) gives 14.4 (also in R). Similarly, adding (4+3i)+(5+8i) (both in C) gives (9+11i) (also in C).
  • Associativity: We should be able to apply these operations in any group order. So for any a, b and c in field F, (a+b)+c=a+(b+c) (the group order in which add add elements doesn't matter). Similarly, (a×b)×c=a×(b×c) (the group order in which we multiply elements doesn't matter).
  • Commutativity: We should be able to apply these operations to any two elements in any order. If have elements a and b in field F, then a×b=b×a and a+b=b+a.
  • Existence of identities: This basically means that there should be the equivalent of 0 and 1 values in the field. Let's say we assign the equivalent of 0 to b in this field F, and we take any other value a in F, then a+b=a, and this value should be denoted as 0 (additive identity). Similarly, if we assign the equivalent of 1 to c in F, then a×c=a, and this value should be denoted as 1 (multiplicative identity).
  • Existence of inverses: This basically means that we should have a way to invert the values, whether by finding the opposite or the reciprocal. Let's say we have a value a in the field F. There should exist a value b in the same field such that a+b=0, and this value should be denoted as a (additive inverse). Similarly, there should exist a value c in F such that a×c=1, and this value should be denoted as a1 (multiplicative inverse).
  • Distributivity of × over +: This basically means that for any a, b and c in F, multiplication should distribute itself over an addition operation. So if we have a×(b+c), then a× should distribute over b and c to give (a×b)+(a×c).

Some examples of fields are R (real numbers), C (complex numbers), and Q (rational numbers). However, Z (integers) is not a field because there's no way to get a multiplicative inverse (n1=1n doesn't exist for any n other than 1 in Z).

Euclid's algorithm

Euclid's algorithm is used to find the greatest common denominator of any two numbers. We can think of this like finding the largest common unit of measurement between two lengths. If we have a 15-meter rope and a 9-meter rope, what's the largest measuring stick that can measure both lengths perfectly?

Here's how Euclid's algorithm works:

  1. We start with two numbers, a and b. Let's take them to be from the example above, a=15 and b=9.
  2. We divide a by b and get the remainder. a/b=15/9=1 remainder 6. This basically means that 15=1×9+6.
  3. If the remainder is 0, then b is the greatest common denominator. If not, then we replace a with b, and b with the remainder. So in this case, now a=9 and b=6.
  4. We repeat 2 and 3 until we get to a remainder of 0. So 9=1×6+3, and now a=6 and b=3. Finally, 6=2×3+0.
  5. Therefore, 3 is the greatest common denominator.

Let's try another example, we can try to find the g.c.d. of 48 and 18:

a=48b=1848=18×2+12a1=18b1=1218=12×1+6a2=12b2=612=6×2+0 g.c.d.=6

But that's not all, Euclid's algorithm can also find m and n such that ma+nb=g where g is the g.c.d. of a and b. Using the same 48 and 18 example from above, we can work backwards:

6=1812×1=18(4818×2)=1848+(18×2)=18×348×1

So, m=1 and n=3.

This is very useful in number theory and cryptography, but also has other uses. For example, it's handy in physics when you're dealing with resonance and need to find out when two periodic motions will sync up, or in computer graphics when you need to find the largest size of tile that will fit evenly in both width and height.

Finite fields

These are like normal Fields except that they're finite in size, and have a limited number of elements. A good example of how this would work is like doing clock arithmetic. When we add 5 hours to 22:00, we get 03:00. It "loops around" back to being within the field, so the field is limited to being from 00:00:00 to 23:59:59.

Let's ignore that clocks aren't a field, they don't have multiplicative inverses for instance, depending on how you look at it. It's an example to prove a point.

Prime fields

This is a field with a prime number of elements. Let's take a prime number p. There then exists a field Fp{0,1,...,p1} that follows the following addition and multiplication rules:

  • Addition: For any a and b in Fp, addition is defined as (a+b)mod(p). This is usually denoted as (a+b) since the fact that it's a finite field and so addition is mod(p) is implied from context, however it's also written as (a+pb) if written explicitly.
  • Multiplication: For any a and b in Fp, multiplication is defined as (a×b)mod(p). This is usually denoted as (a×b) since the fact that it's a finite field and so multiplication is mod(p) is implied from context, however it's also written as (a×pb) if written explicitly.

An example of this can be a prime field with 5 elements: F5={0,1,2,3,4}. Here, whenever we add or multiply, we take the remainder after dividing by 5 (like looping around with the clock example above). So if we take a=b=4, then (4+54)=3 because 8 divides with 5 to give a remainder of 3. Similarly, (4×54)=1 because 16 divides with 5 to give a remainder of 1. In the clock example from earlier, we were effectively dividing by 24 to loop around.

Irreducible polynomials

Just like prime numbers can't be factored into smaller integers, irreducible polynomials can't be factored into smaller polynomials over a given field. First, let's look at a simple example in real space R:

  • x21 can be reduced to (x+1) and (x1), since (x+1)(x1)=x21. Therefore it is not irreducible in the field of real numbers.
  • x2+1 cannot be reduced any further in R, we need complex numbers to factorise it, which aren't available in the field of only real numbers. Therefore, it is irreducible.

Now let's look at an example in a finite field, F2 for instance, where the only allowed coefficients are 0 or 1:

  • Let's start with a polynomial, x2+x+1. Because all of the coefficients are 0 or 1, this polynomial exists in F2.
  • The only possible factors of this polynomial are first-degree polynomials: (ax+b).
  • We can try all of the combinations of a=0,1 and b=0,1.
  • We will find that none of them multiply together to give us x2+x+1.
  • Therefore, x2+x+1 is irreducible in F2.

We can prove this mathematically:

a=0,1b=0,1First degree polynomials: x,x+1,1(x)(x)=x2(x)(x+1)=x2+x(x)(1)=x(x+1)(x+1)=x2+1(x+1)(1)=x+1(1)(1)=1

None of these give us x2+x+1, so it is irreducible in F2.

These are important because they are like building blocks. In physics, we build complex wavefunctions out of simpler basis functions (think |0 and |1). Similarly, we use irreducible polynomials to build larger fields.

The HAROLD fact states that for any prime p and any positive integer m there exists at least one irreducible polynomial of degree m over Fp. Basically, no matter what size field we need, we can always find an irreducible polynomial to construct it.

Extension fields

For example, if we want to construct F16 (which is basically just F24), we can start with F2 (which is just 0 and 1), and find an irreducible polynomial of degree 4 (like x4+x+1), and use this to construct a field with 16 elements. The elements of F16 will be all polynomials with degree < 4 with coefficients in F2. This means that every element will look like ax3+bx2+cx+d, where a, b, c and d are either 0 or 1.

0=0000 in binary1=0001 in binaryx=0010 in binaryx+1=0011 in binaryx2=0100 in binaryx2+1=0101 in binaryx2+x=0110 in binaryx2+x+1=0111 in binaryx3=1000 in binaryx3+1=1001 in binaryx3+x=1010 in binaryx3+x+1=1011 in binaryx3+x2=1100 in binaryx3+x2+1=1101 in binaryx3+x2+x=1110 in binaryx3+x2+x+1=1111 in binary

Addition

Addition works by adding coefficients modulo 2:

(x3+x+1)+(x2+x+1)=x3+x2

Multiplication

Multiplication is trickier, after multiplying normally we have to reduce by the irreducible polynomial from earlier, x4+x+1. Let's start by multiplying x3 and x2, for example:

x3×x2=x5

But x5 doesn't exist in F16, so we need to reduce it by x4+x+1:

x5=x×x4=x×(x+1)=x2+x

Therefore, in F16, x3×x2=x2+x.

Primitive element

We can also find a primitive element. Let's try x as a candidate. We'll compute its powers:

x1=xx2=x2x3=x3x4=x+1 because x4+x+1=0x5=x2+xx6=x3+x2x7=x3+x+1...

If we continue this, we will get all the non-zero elements before returning to 1, making it a primitive element.

Multiplicative inverse

There are two main ways to find the multiplicative inverses:

  1. Extended Euclidean Algorithm Method: For element a(x), find polynomials s(x) and t(x) such that s(x)a(x)+f(x)t(x)=1 where f(x) is our original irreducible polynomial (x4+x+1 for F16). For example, if we want to find the multiplicative inverse of x3+x+1:

    1. Divide x4+x+1 by x3+x+1:

      x4+x+1=x(x3+x+1)+(x2+1)
    2. Divide x3+x+1 by x2+1:

      x3+x+1=x(x2+1)+1
    3. Since the remainder is now 1, we're done. Working backwards:

      1=(x3+x+1)x(x2+1)=(x3+x+1)x((x4+x+1)x(x3+x+1))=(x3+x+1)x(x4+x+1)+x2(x3+x+1)=(x2+1)(x3+x+1)x(x4+x+1)

    So the multiplicative inverse is x2+1.

  2. Exponential trick (usually faster): In F16, any non-zero element a satisfies a15=1=a0, and so a14=a151=a01=a1. For example, taking a=x3+x+1:

    (x3+x+1)14=(x3+x+1)8+(x3+x+1)4+(x3+x+1)2(x3+x+1)2=(x3+x+1)(x3+x+1)=x6+x4+x3+x4+x2+x+x3+x+1=x6+2x4+2x3+x2+2x+1=x6+x2+1x6=x2(x4+x+1)+x3+x2=x3+x2(x3+x+1)2=(x3+x2)+x2+1=x3+1(x3+x+1)4=(x3+1)2=(x3+1)(x3+1)=x6+2x3+1=(x3+x2)+1=x3+x2+1(x3+x+1)8=(x3+x2+1)2=(x3+x2+1)(x3+x2+1)=x6+x5+x3+x5+x4+x2+x3+x2+1=x6+2x5+x4+2x3+2x2+1=x6+x4+1=(x3+x2)+(x+1)+1=x3+x2+x(x3+x+1)14=(x3+x+1)8×(x3+x+1)4×(x3+x+1)2=(x3+x2+x)(x3+x2+1)(x3+1)=(x6+x5+x3+x5+x4+x2+x4+x3+x)(x3+1)=(x6+2x5+2x4+2x3+x2+x)(x3+1)=(x6+x2+x)(x3+1)=(x3+x2+x2+x)(x3+1)=(x3+2x2+x)(x3+1)=(x3+x)(x3+1)=x6+x3+x4+x=(x3+x2)+x3+(x+1)+x=2x3+x2+2x+1=x2+1

    Therefore, the multiplicative inverse is x2+1.

Linear algebraic notions that are similar/different for finite fields

Similar

The ideas of:

  • Gaussian elimination
  • Matrix addition/multiplication/inverses
  • Determinants
  • Vector spaces
  • Row/columns/null-spaces of matrices/bases/rank

are all shared between infinite fields like R, C and finite fields Fp. The only difference is that all operations in the finite field are modp.

Gaussian elimination

This is a way to solve a system of linear equations by converting them to an easier format and "eliminating". Say we have two equations, 2x+y=5 and x+3y=4, let's call them equations 1 and 2, respectively.

First, we multiply one of the equations such that one of its coefficients is the same as the other. For example, we can multiply equation 2 by 2, so that x becomes 2x, matching the 2x in equation 1. This gives us 2x+y=5 and 2x+6y=8, the latter of which we can call equation 2a.

Now, we "subtract" equation 1 from 2a, giving us (2x2x)+(6yy)=(85)5y=3. We've eliminated the x term. Therefore, we know that y=35. Now, we can solve any one of our linear equations to get the value of x. Solving equation 1, 2x+35=5, gives us a value of x=115.

This process is the same in finite fields, except all operations are modp for a field of size p. For instance, solving the same problem in F5:

2x+y=1 (since 5 = 0 in mod 5)x+3y=42x+y=12x+y=3 (multiplying the first equation by 2)

NOTE

I discovered this on my own while trying to solve the problem Claude was giving me. Turns out you can't solve some problems in finite fields that work in R or C.

We can see that this equation doesn't work. However, the idea still stands, we can use Gaussian elimination to solve linear equations like these even in finite fields.

Differences

The main differences are that:

  • Instead of Euclidean distance, we use Hamming distance.
  • Instead of Euclidean norm, we use Hamming weight.
  • "Angles" between vectors don't make sense. There is no natural notion of "size" or "direction" in finite field vectors, so we can't really have angles between them.

Hamming distance

In infinite fields, we use Euclidean distance, (x1y1)2+(x2y2)2. However, in finite fields, we use Hamming distance.

Basically, Hamming distance is the number of locations in which two vectors differ. So if we have a vector (1,0,1) and another vector (1,1,0), the Hamming distance between them is 2. This is because 2 positions – positions 1 and 2 – differ between the vectors.

Hamming weight

In infinite fields, we use Euclidean norm x12+x22. However, in finite fields, we use Hamming weight.

Basically, Hamming weight is the number of locations a vector is non-zero. So the Hamming weight of a vector (1,0,1,0) is 2, since there are 2 non-zero positions – the first and the third.

Error-correcting code

We start with some basic parameters:

  • q: the size of the alphabet we will be using. This is usually a prime p or a prime power pm.
  • k: The number of message symbols. This is the length of the original message.
  • n: The blocklength. This is the length of the message after encoding, it is always larger than k.
  • R: The rate, or efficiency of communication, equal to kn. The smaller R is, the more redundancy there is, hence our communication method will hopefully be more tolerant of errors, at the cost of being slower.

Then, we add some key components:

  • m: The message. This is made up of k symbols uniformly distributed in Fq, formally written as (Fq)k.
  • x: The codeword. This is the encoded message, in n symbols all in Fq, formally written as (Fq)n.
  • y: The received, possibly corrupted, word. This is also of length n, and in Fq. It corresponds to the noisy observation of x (which basically means after transmission and the data losses that happened), and may have up to pn erasures (missing data) or errors (incorrect symbols).

We have some additional parts here as well:

  • Enc: the encoder, which maps m to x. We can write this as x=Enc(m). Hence, the codeword x is also sometimes denoted as x(m).
  • Dec: the decoder, which maps y to m^, the decoded message. We can write this as m^=Dec(y).
  • C: the code. This is the collection of all possible codewords x that could be generated by the encoder.

Finally, we have the distance:

  • d: minimum distance. This is the smallest Hamming distance in C between any two codewords, i.e. minx,xCdH(x,x).
  • Any code with minimum distance d can tolerate up to d1 erasures, or less than d2 errors.

Let's say we're working in F2 (binary):

  1. We start with a message m=[1,0,1]. The length of this message k=3.
  2. We encode this message using our encoder to get x=Enc(m)=[1,0,1,1,0,0,1]. The length of this encoded message n=7.
  3. Therefore, the rate R=kn=37.
  4. Let's say after transmission, we receive y=[1,0,1,1,1,0,1]. The Hamming distance between x and y is 1, and we can see that the element in the 5th position has changed, so one error has occurred.

In practice, this workflow looks something like:

Encoder: mx=Enc(m)Channel:  might corrupt x to yDecoder: ym^=Dec(y) (m^ is an estimate of m)

The goal with this is to add enough redundancy to correct errors, but not so much that transmitting the message becomes too slow. So we need to find the sweet spot between reliability d and efficiency R.

Distance

The basic idea of distance in error correction is that, if two codewords are a distance d apart, then it takes d changes to turn one into the other. For example, if we have codeword 1: (1, 0, 0, 1) and codeword 2: (0, 1, 1, 1), then their distance is 3, so it takes 3 changes to go from one to the other.

Let's say the minimum distance d=5 between any two codewords. If we now get a corrupted code word with 2 errors, we know that it can only be distance 2 from the original codeword, but must be at least distance 3 from any other codeword. Therefore, the closest codeword must be the original message.

Let's say n=5. Not every 5 character string will be a valid codeword, because the encoder uses a specific algorithm to generate these codewords. We will see this in a bit with the (7, 4, 3) Hamming code.

A visual example:

Original:    1 1 1 0 0
Received:    1 1 0 1 0
                 ^ ^   2 differences

Other valid: 0 0 0 1 1
             ^ ^     ^ 3 differences

Therefore, the received word is closer to the original message than any other codeword.

This follows the rule of less than d2 errors. For instance, if we got 3 or more errors in the example above, we wouldn't be able to tell which valid codeword was the closest, and therefore correct, one.

Original:    1 1 1 0 0
Received:    1 0 0 1 0
               ^ ^ ^   3 differences
Other valid: 0 0 0 1 1
             ^       ^ 2 differences

Here, we're finding the incorrect valid codeword as the closest one, because of too many errors.

Erasures, on the other hand, are easier to correct, because we know which positions are corrupted, and so we can correct up to d1 erasures. For example:

(? means erasure)

Original:    1 1 1 0 0
Received:    1 ? 1 ? 0

Other valid: 0 0 0 1 1

Only the original message matches, and so we can easily correct this. Whereas with errors, we don't always know which positions are incorrect, so we can only correct less than d2 errors.

Linear codes

A linear code multiplies our message m with an n×k generator matrix to generate the codeword x. For example, in F2, we can have a generator matrix:

G=[10011110]

and a message m=[10]. This gives us a codeword:

x=Gm=[10011110][10]=[(1×1)+(0×0)(1×0)+(0×1)(1×1)+(0×1)(1×1)+(0×0)]=[1011]

So a 2-bit message becomes a 4-bit codeword.

Linear codes have some key properties:

  • The sum of two codewords is also a codeword.
  • The field multiple of a codeword is also a codeword.
  • The zero vector is always a codeword.
  • The minimum distance d is the minimum weight of a non-zero codeword.

Repetition code

A simple way to encode messages is to just repeat them. Let's say we have m=[1]. Using a generator matrix G=[111], we can get x=mG=[111].

(7,4,3) Hamming code

This takes a 4-bit message, m=[m1m2m3m4] and converts it into a 7-bit codeword. It does so by adding 3 parity bits [p1p2p3] to get a codeword x=[p1p2m1p3m2m3m4]. The parity bits are chosen so:

p1=m1+m2+m4p2=m1+m3+m4p3=m2+m3+m4

This ensures minimum distance d=3. This is achieved by making the parity checks "overlap" – each message bit affects multiple parity bits, making it impossible to change just one or two positions and get another valid codeword. The generator matrix for this looks like this:

G=[1110000100110001010101101001]

Reed-Solomon codes

The generator matrix is a Vandermonde matrix, with the (i,j)th element equalling (ai)j1 for distinct ai's. The field size q has to be at least n, and this can tolerate up to nk erasures.

The Vandermonde matrix looks like this:

[1a1a12...1a2a22...1a3a32...]

where a1, a2, a3, etc., are all distinct field elements.