Welcome back to Cryptography Dispatches, my lightly edited newsletter on cryptography engineering. You know the drill: no tracking, so thank you for replying and saying hi! There are now a couple interesting links at the bottom, by the way.
X25519 is a simple Elliptic Curve Diffie-Hellman (ECDH) function: it takes a scalar (a fancy name for an integer) and an elliptic curve point, and it multiplies the elliptic curve point by the scalar.
Point additions and multiplications work modulo the order of the point, just like hours on a watch work modulo 12. In cryptography we try to work with points of order a large prime number. This is the order of the scalar field, and is commonly referred to as
The age design includes a double invocation of X25519: once with a domain separation tweak, once with the actual secret. While implementing this I wondered: can we multiply those two scalars first, instead of doing two point multiplications in a row? It would be much faster!
That is, given two scalars
s2, can we find a third scalar
s3 such that the following is true?
X25519(s3, P) == X25519(s2, X25519(s1, P))
If we are working on the pure curve, the answer is simple: yes, just work out the multiplication modulo the order of the scalar field.
s1 * (s2 * P) = (s1 * s2) * P mod q
s3 = s1 * s2 mod q
But! X25519 is not pure math. In fact, it takes a 32 bytes array as input for the scalar. That array eventually gets interpreted as a large little-endian number, but first it’s clamped.
Clamping is this extremely unmathematical operation:
s &= 0b11111000
s &= 0b01111111
s |= 0b01000000
It comes straight from the original Curve25519 paper, and most of us just treated it with the deferential skepticism reserved to black magic.
The first part,
s &= 0b11111000, is clearing the 3 least significant bits (remember, little-endian, so the first byte is the least significant). That makes the scalar always a multiple of 8, which clears the cofactor of the point that’s multiplied by it.
The second part is way more obscure: it’s clearing the 256th bit, and setting the 255th. I (and most others I talked to) always thought it was making sure the number is smaller than
q, but no:
q has 253 bits, this is specifically ensuring the scalar overflows the modulo. I eventually found a mailing list post by Bernstein where he explains he’s trying to protect against a specific implementation bug by making the position of the highest bit constant. Until that email in 2014, the only explanation seems to have been a line on these slides.
Ok, so the question is: is there a way to represent our product of two scalars in such a way that clamping will not change it, so that we can use a stock X25519 implementation?
Or, can we find a value
s3 such that the following is true?
clamp(s3) == clamp(s1) * clamp(s2) mod q
To survive clamping without modification, our value will have to be a multiple of 8, have the 256th bit unset, and the 255th bit set. That is, it needs to be a multiple of 8 between 2^254 and 2^255-1.
We start by reducing
clamp(s1) * clamp(s2) by
q (at which point it might not be a multiple of 8 anymore, because
q and 8 are co-prime, because
q is prime). Then we add
q to our reduced product until we reach 2^254 (remember, modulo
q it’s like adding zero), and then keep adding
q until we pass 2^255-1. If any of the numbers we hit in the range is a multiple of 8, that number can be passed to X25519 and it will survive clamping.
x = clamp(s1) * clamp(s2) mod q
s3 = x + k * q
q only fits 3 times in the range of clamping-safe values, we only get 4 tries to hit a multiple of 8. That’s not enough to be certain we will, and depending on the starting point we could get lucky or not.
For example, if we pick these two scalars
s1 = 0x84d5e9b76850c18052ca3b0a9e86ab80c61896ae004689f80c0e08db24dbce16
s2 = 0xf73ad45dd13145011f96d979ea56c0916e2725bbaf1126ea9c68767cdf5f0e38
then there is a clamping-safe product
s3 = (clamp(s1) * clamp(s2)) % q + 5 * q = 0x541805c3f9e04d7a08e548467006698c5f2efebc0963e83c982b31e7a96a2680
but for these two other scalars
s1 = 0x515387f40ace84e4fafe73d86297c42bc1ea3e0cef0b3236a7db082f5e35fb28
s2 = 0x8202824542a0d07193281252debce886faa94811f38763cbd350ca85f60786ce
none of the values equivalent to the product with the right high bits are multiples of 8, so there is no combined X25519 output for them.
The answer to “can you combine two X25519 operations by multiplying the scalars first” is, quite unexpectedly, “sometimes, it depends randomly on the two scalars”. Cryptography is weird.
(If we really wanted to do the performance trick, we could still take a modified implementation of X25519 that doesn’t clamp, and pass
clamp(s1) * clamp(s2) mod q into that, but this wasn’t worth forking the standard library for age.)
A few interesting unrelated links
- Let’s Encrypt published a nice description of ASN.1, the cursed encoding format of all things legacy crypto, which does a good job of communicating that no, this stuff is not easy, but no, this stuff is not black magic.
- A new paper dropped on the ePrint about ECDSA. Soatok has a good commentary. Don’t panic and carry on avoiding ECDSA.
- I’m proposing a Go API for age, my file encryption format/library/tool, and would love feedback.
- yubikey-agent is a seamless, indestructible, one-click SSH agent for YubiKeys.
- draft-irtf-cfrg-ristretto255-00 is available in beautifully typeset HTML. (Not kidding, I was ready to hate the new RFC presentation format and instead I love it. It has paragraph anchors!)
- Cthulhu awoke 🦑
Appendix: the Sage code
As an appendix, here’s the Sage code I used to work this out. You can run it multiple times and observe that sometimes it will find a value
s3 that matches both conditions, and sometimes it won’t.
ec = EllipticCurve(GF(2**255-19), [0,486662,0,1,0])
base_point = ec.lift_x(9)
reverse_endianness = lambda h: "".join(h[i:i+2] for i in range(len(h), 0, -2))
hex_to_int = lambda h: Integer(reverse_endianness(h), 16)
int_to_hex = lambda s: reverse_endianness(s.hex().zfill(32 * 2))
x_coord = lambda p: int_to_hex(Integer(p.xy()))
high_mask = 1 << 254
clamp = lambda s: ((s % high_mask) + high_mask) // 8 * 8
in_range = lambda x: high_mask <= x and x < 2 * high_mask
s1 = ZZ.random_element(1 << 256)
s2 = ZZ.random_element(1 << 256)
x = (clamp(s1) * clamp(s2)) % base_point.order()
for i in range(3, 8):
s3 = x + i * base_point.order()
print(i, in_range(s3), s3 % 8 == 0)
A pretty picture of the Catskills
I drove 500 miles upstate over the weekend and it blew away all my expectations.