Cryptography Dispatches: Re-Deriving the edwards25519 Decoding Formulas

Welcome back to Cryptography Dispatches, my lightly edited newsletter on cryptography engineering. Today, math. I have no analytics, so I won't know how hard this flops, but at least it was very rewarding to write. Back to our "regularly scheduled" programming in the next issue.

— Filippo

A lot of my job is implementing specifications, and sometimes in a crypto spec you'll encounter something like this

         (p+3)/8      3        (p-5)/8
x = (u/v)        = u v  (u v^7)         (mod p)

and what you do is nod, copy it into a comment, break it down into a sequence of operations, and check that the result matches a test case.¹

However, the other day I was having a bit of an identity crisis because I could not remember basic algebra, so I went and re-derived the edwards25519 point decoding formulas as a sort of homework. It turned out to be pretty useful for understanding pieces of the implementation I had been just treating as black boxes.

I'm going to try to take you along for the ride, to show that there is no dark magic involved, and that we can all get to the same result as the specification with step-by-step high-school algebra.

Warning: math. (You've got this!)

My whiteboard where I worked out the formulas in this issue

Your mission, should you choose to accept it...

What we are trying to do is derive the concrete formulas to decode a point on the edwards25519 elliptic curve, the one used for the Ed25519 signature algorithm.

Points are described by two integer coordinates, x and y, that satisfy the curve equation below, our starting point. Points are encoded by serializing the y coordinate and a single bit to distinguish between the two possible x values that can correspond to it.

The curve equation is

-x² + y² = 1 + dx²y²   mod p

where d is -121665/121666 and p is 2²⁵⁵-19.

This is also called point decompression and is not only more efficient on the wire but also safer than serializing both coordinates, as the latter encourages skipping point validation, which can lead to a fun array of vulnerabilities.

The algorithm is described by Section 5.1.3 of RFC 8032.²

2.  To recover the x-coordinate, the curve equation implies
    x^2 = (y^2 - 1) / (d y^2 + 1) (mod p).  The denominator is always
    non-zero mod p.  Let u = y^2 - 1 and v = d y^2 + 1.  To compute
    the square root of (u/v), the first step is to compute the
    candidate root x = (u/v)^((p+3)/8).  This can be done with the
    following trick, using a single modular powering for both the
    inversion of v and the square root:

                         (p+3)/8      3        (p-5)/8
                x = (u/v)        = u v  (u v^7)         (mod p)

3.  Again, there are three cases:

    1.  If v x^2 = u (mod p), x is a square root.

    2.  If v x^2 = -u (mod p), set x <-- x * 2^((p-1)/4), which is a
        square root.

    3.  Otherwise, no square root exists for modulo p, and decoding
        fails.

4.  Finally, use the x_0 bit to select the right square root. [...]

We'll break down each of those steps.

Stand back, I'm going to try algebra

First, we can do regular algebra to isolate the x on one side of the equation. Remember that we have everything in the equation except the x, and keep in mind that every formula is over the base field: that is, values are integers modulo p.

-x² + y² = 1 + dx²y²
x² + dx²y² = y² - 1
x²(dy² + 1) = y² - 1
x² = (y² - 1)/(dy² + 1)

This matches the spec's x^2 = (y^2 - 1) / (d y^2 + 1), yay!

Division and inverses

In a finite field (the integers modulo a prime), division by n is performed by finding the inverse n⁻¹ and multiplying by that. The inverse is an integer, and behaves like 1/n: n * n⁻¹ = 1.

We can use Euler's theorem to compute the inverse, the same theorem that makes RSA work. ϕ(p) is p - 1 for all primes.

a ^ ϕ(p) = a ^ (p - 1) = 1 = a⁰
a ^ (p - 2) = a⁻¹

Full size exponentiation, even by a pretty number like 2²⁵⁵ - 21, is a pretty expensive operation (255 squarings and 11 multiplications) but it can be done in constant time, so that's nice.

The denominator is always non-zero

Just like you can't divide by zero, there is no modular inverse for zero, so that formula only works if dy² + 1 is not zero. The spec confidently proclaims: "The denominator is always non-zero mod p". Let's check that.

dy² + 1 ≠ 0
dy² ≠ -1
y² ≠ -1/d = 121666/121665

One way to check that inequality, is to check that 121666/121665 is not square in the field. We can use Euler's criterion and some Python for that. If the Euler's criterion formula works out to -1, 121666/121665 is not square and can't be equal to y², so we are good.

(121666/121665) ^ ((p - 1) / 2) = -1
(121666 * 121665⁻¹) ^ ((p - 1) / 2) = -1
(121666 * 121665 ^ (p - 2)) ^ ((p - 1) / 2) = p - 1

>>> p = 2**255 - 19
>>> a = 121666 * pow(121665, p - 2, p)
>>> pow(a, (p-1)//2, p) == p - 1
True

Yay.

Candidate for square root office

Back to our formula. The spec tells us to call numerator u and denominator v for convenience, and then to look for the square root of that ratio.

x² = (y² - 1)/(dy² + 1)
x² = u/v = uv⁻¹
x = √(uv⁻¹)

How do you find a square root in a field? Since p = 5 mod 8 (because number theory is magic and formulas work or break based on the shape the numbers have) we can use Legendre's solution.

It starts by trying what the spec calls the candidate root x = (u/v)^((p+3)/8) and we'll call x₁.

The spec says it uses "a trick" to combine the inversion and the square root. Why? Remember that exponentiation by a large number takes a lot of operations, and both the inversion and the square root require an exponentiation. The trick coalesces the two exponents into a single value. Let's break it down.

Recall that per Euler's theorem, a ^ (p - 1) = a⁰ = 1...

x_1 = (u/v)^((p+3)/8) = u^((p+3)/8) v ^ (-(p+3)/8) = u^((p+3)/8) v^(p-1) v^(-(p+3)/8) = u^(1+(p-5)/8) v^(p-1-(1+(p-5)/8)) = u^(1+(p-5)/8) v^(3+p-5-(p-5)/8) = u^(1+(p-5)/8) v^(3+7(p-5)/8) = u v^3 (u v^7 )^((p-5)/8)

Indeed, we get the same result as the spec says, and we now understand the formula at the top of this issue.

          (p+3)/8      3        (p-5)/8
x₁ = (u/v)        = u v  (u v^7)

This is a rare "good clever": the numbers are the same as when we started, but arranged like this the result is much faster to compute for two reasons.

First, uv⁷ is easy to compute after computing uv³. How we do this kind of math on computers is through a sequence of additions, multiplications, and squarings, and we get to hold on to any intermediate values that we might use later for free. uv³(uv⁷) is just five multiplications and a squaring how I ended up doing it.

uv³ = u * v² * v
uv⁷ = [uv³] * [v²] * [v²]
uv³(uv⁷) = [uv³] * [uv⁷]

Second, like we hardcoded an exponentiation by p - 2 = 2²⁵⁵ - 21 to implement inversion, here we can hardcode a similar exponentiation chain for (p - 5) / 8 = 2²⁵² - 3.³ This explains the function called fePow22523 in the library I was looking at!

If not this one, the other one

Ok, but x = x₁ is only one of the two possible Legendre's solutions. It can also be x = x₁ * 2^(p-1)/4, based on some obscure property of x² that we don't want to figure out directly.

It's easy to check if x = x₁ is the right one.

x₁ = x = √(uv⁻¹)
x₁² = uv⁻¹
vx₁² = u

What if it's x = x₁ * 2^(p-1)/4 though? How would we know?

Well, there's something special about 2^(p-1)/4: per Euler's criterion (the same one we used to check if the denominator was square earlier), that value is √-1, the square root of minus one.⁴ (In complex numbers we would call that i, the imaginary constant, but on a finite field there is acutally such an integer.) This also explains why that constant is called SQRT_M1 in the code I was looking at!

2^(p-1)/4 = √-1
(2^(p-1)/4)^2 = 2^(p-1)/2 = -1

>>> SQRT_M1 = pow(2, (p-1)/4, p)
>>> pow(SQRT_M1, 2, p) == p - 1
True

So if x = x₁ is not it, we need to check if x = x₁ * √-1 is the correct solution.

x₁ * √-1 = x = √(uv⁻¹)
(x₁ * √-1)² = uv⁻¹
-x₁² = uv⁻¹
vx₁² = -u

Note how similar to the x = x₁ check that is. This means we can compute vx₁², and pick x = x₁ if it works out to u, or x = x₁ * 2^(p-1)/4 if it works out to -u, which is a very cheap comparison. If it's neither, there is no square root and we exit.⁵

Indeed, this matches the algorithm of the spec!

Wrapping up

Cool, we have x and everything matches the spec, but most importantly we understand how we got there. The final step is choosing between x and -x, since just like with regular algebra, they are both square roots of x². They both correspond to valid points, one the negative of the other. We use the x_0 bit (aka the sign bit) that was stored in the unused top bit of y to choose between them.

All this is implemented in filippo.io/edwards25519 my new Go library based on work by Adam Langley, George Tankersley, and Henry de Valence. This library is faster, safer, and easier to use than crypto/ed25519/internal/edwards25519, which a lot of projects forked over the years. It also supports multi-scalar multiplication for batch signature verification. Try it out!

By the way, high quality specs should have intermediate test cases, not just test cases for the whole algorithm. There are two reasons: first, in cryptography when you get something wrong it can be extremely hard to narrow down where the mistake is, all you know is that you wrote 300 lines and the result is wrong, great; second, you get to more easily write test cases for edge cases that are hard to hit from the high-level API. ↩
Sort of. Almost no one actually implements Section 5.1.3, because it retconned a few strictness checks that were not in the original Ed25519. After reading this issue you'll be in a better position to understand that mess as explained by Henry de Valence but you don't need to understand that first to read on. ↩
If you squint here, you can see why that formula we used requires p = 5 mod 8: it means (p+3)/8 and (p-5)/8 are integers. ↩
A bit of cheating here: I did not recognize that value as √-1. Instead, I looked at the x = x₁ * 2^(p-1)/4 condition, and tried to work my way back to how that was possible, which required √-1 to exist. ↩
If we want to do this without leaking information about the point through a side-channel, we'll have to compute both and do a select in constant time, but again this is cheap. The ristretto255 spec has this whole "constant time square root of a ratio" operation defined, including what it returns when the ratio is not a square. ↩

Dec. 18, 2020, 1:15 p.m.