Cryptography Dispatches: The Linux CSPRNG Is Now Good!

Welcome back to Cryptography Dispatches, my lightly edited newsletter on cryptography engineering. This newsletter doesn't track you, so I'll only know there's someone reading this if you hit Reply. Please do!

— Filippo

Oceans of ink and hours on stage have been spent to convince the world that the best random number generator is /dev/urandom, the kernel one. And it is, and it's always been.

However, an uncomfortable truth was that the Linux CSPRNG really could have been better than it was. Userspace CSPRNGs couldn't be better than the kernel one, so our advice was still valid, but that space for improvement always frustrated me.

Good news everyone!

In recent years, the Linux CSPRNG got a number of great incremental improvements, and I can now say in good conscience that it's not only the best, it's also good.

`getrandom(2)`

The main misfeature of the /dev/[u]random interface is that /dev/random blocks with no cryptographic justification, while /dev/urandom doesn't block at the only time it matters: boot time, before enough entropy is collected. In 2014, Linux 3.17 introduced the getrandom(2) system call, copying its design from the BSDs: it always blocks at boot until the pool collected enough entropy to become unpredictable, and then never again¹. That's all you should use now.

The upcoming Linux 5.6 incredibly goes even further, finally acknowledging that entropy does not run out² and removing the whole /dev/random blocking pool. /dev/random will then behave like getrandom(2) and only block at boot. You still shouldn't use it, because on older kernels it will block unnecessarily, and on newer kernels getrandom(2) is available. Anyway, I did not think I would see the day /dev/random actually worked like it's supposed to. Sometimes good things happen, eventually!

They are also adding a GRND_INSECURE flag for getrandom(2) that provides best-effort non-cryptographic randomness at early boot, which I appreciate as an example of loud security APIs.

By the way, we got getrandom(2) thanks to the OpenBSD folks and their LibreSSL work, in case you needed some organization to donate money to.

Performance and ChaCha20

Some people would say they needed a userspace CSPRNG for PERFORMANCE. I never really believed most of them, but to be fair Linux was using a kinda slow SHA-1 extractor back then. However, since Linux 4.8 (2016) the default getrandom(2) source is a fast ChaCha20-based CSPRNG, with separate pools per NUMA node to avoid contention. (Just ignore the rude comments in the code about applications not running their own CSPRNG, this is still Linux after all.)

There's even a neat trick XOR'ing some of the CSPRGN output back into the ChaCha20 state to prevent an attacker from recovering any past output from before the time of compromise.

Some of these improvements came along thanks to the Wireguard work by Jason A. Donenfeld, if you still have money to donate.

CPU RNGs

Most modern CPUs have instructions to get random numbers from physical effects in the chip, like RDRAND on x86-64, and Linux has always mixed those into the CSPRNG state and output, which is great. The ChaCha20 extractor XORs part of the cipher state with a value from the CPU RNG at every block.

However, Linux does not consider random numbers from the CPU as part of its entropy accounting because of some puzzling mistrust in the CPU it runs on³, so it could still block at boot even if the CPU exposes an RNG.

Since Linux 4.19 (2018), you can set a kernel config option or a simple boot parameter to make the pretty nonsensical statement "I trust the CPU that is evaluating this statement", and never worry about blocking for entropy again.

I find it pretty funny that hardware RNGs that sit outside of the CPU are instead always trusted to initialize the CSPRNG, by code that runs inside the CPU, which however doesn't always trust itself for the same purpose.

Blocking is unacceptable

Just when things were settling down, we had a close call: an unrelated filesystem improvement caused getrandom(2) to get initialized a little later, and of course systemd choked on it. Linus found that unacceptable and had to be talked down from breaking getrandom(2) by making it not block at boot (which somehow doesn't count as breaking userspace, I don't even know). It was ugly, with Linus arguing explicitly against secure-by-default APIs.

Anyway, reason prevailed, and since Linux 5.4 (2019) the kernel will just generate some entropy from execution jitter if it's been blocked for more than one second.

"Wait", you'll say, "it was always that easy? Why did we have the blocking wars when Linux could always have just made up some entropy?"

Well, I don't know what to tell you, dear reader, but here's a picture of a sunset in Puerto Rico.

A beautiful sunset to help us forget about entropy.

Bonus: `/dev/random` was actually bad

Not only did /dev/random block with no good reason, it used not to block when it's important! This was news to me!

Before Linux 5.2 (2019), the blocking pool was ready to dispense N bits of randomness as soon as it accumulated N bits of entropy. In its information theoretical design this would make sense, if we could trust the entropy estimates, but nobody does! That means using /dev/random could be strictly less secure than using getrandom(2), because at least the latter always waits for 128 bits of entropy, which should provide some margin for error.

It also means you couldn't wait for /dev/random to become readable to detect the pool being initialized, which I've seen in the wild a couple times. So yeah, use getrandom(2).

There is a flag to make it be silly like /dev/random, GRND_RANDOM. It's being removed in Linux 5.6 and you should just ignore it. ↩
Entropy does not run out. Entropy accounting that goes to zero and blocks /dev/random was a vestige of an informational theoretical argument that only made sense before we trusted cryptography, and even then it's not clear what you're going to do with cryptographycally-secure random numbers if you don't trust cryptography. ↩
I should mention that AMD somehow did manage to fuck up RDRAND, repeatedly, but that should be treated like any other CPU bug, with unfortunate ad-hoc workarounds and microcode updates. ↩

Feb. 10, 2020, 6:38 p.m.