Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Better question, why not use whatever hash is fastest on the system in question? We're talking about a RNG here -- it's not like we need to make sure that two different systems produce the same random numbers!


My answer to that is "maintainability". The more flexibility and moving parts we add to the RNG, the more things that can break. The delta from "one hash function" to "two hash functions" would involve not just adding a second hash function, but also a bunch of configuration code, target-specific logic, fallback handling, etc. There are plenty of places in the kernel that need to care about this kind of logic, but I don't believe the RNG is one of them, and I don't particularly want to require the RNG maintainers to spend their time caring about it.

Additionally, it's not just the hash function that would matter for speed here, but also the expansion. Linux RNG uses ChaCha20 for that, so if you were going all-in on target-specific speed, you'd need additional logic for swapping that out for a hardware-accelerated cipher (probably AES, which would introduce even more considerations given that it has a 16-byte block size, vs ChaCha20's 64-byte blocks).


The title mentions performance, but it is not the primary motivation AFAICT. It is only mentioned to say “it is not slower”.

The main concern was security, so it makes sense to use BLAKE2, which benefits from existing cryptanalysis of the ChaCha20 permutation, which is already used in the RNG for number generation.

(And it makes sense to use BLAKE2s in particular, to support non-64-bit systems without penalty.)

Using a single hash (instead of picking one at runtime) simplifies the attack surface IMO.


The argument here is that many (most?) Linux systems have access to a hardware SHA2, which is equally secure (in this setting) but faster. Attacks on SHA2 or Blake2 (really, of SHA1, for that matter) aren't how viable real-world attackers on the LKRNG are going to happen. It's good to see SHA1 getting swept away for other reasons though!


Do most Intel (and by extension, Linux) machines have SHA2, though? I think it’s a pretty recent extension and at least initially, they were only shipping it in their low-end embedded models.


Most Intel processors don't have the extension, it is only Goldmont (low power) and 10th gen and up.

All AMD Zen processors have the extension.


See, what do I know. Definitely not "most".


Are you sure? I haven’t been able to buy a CPU that doesn’t have SHA2 acceleration for an number of years now.


> I haven’t been able to buy a CPU that doesn’t have SHA2 acceleration for an number of years now.

This is incorrect. Intel only launched their 11th gen desktop processors March 30, 2021. The 10th gen and earlier desktop processors do not have the SHA instructions. You can still buy a new i9-10900k from Newegg today.

(Note that 10th gen Intel mobile/laptop processors are a different micro-architecture, and do support SHA.)

Edit: Perhaps you're thinking of the AES instructions? They've been around a lot longer.


I think I am. Thanks for the correction.


The argument that not all CPUs running Linux have hardware SHA2 is valid, and therefore can't be assumed. However saying because one (even a significant one... though it's arguable it hasn't been the majority for a while) doesn't and therefore it shouldn't be used seems shortsighted at best. For decades various minority features have been enabled in the Linux kernel. Since when is lowest common denominator the desirable target to utilize exclusively?


I am not making the argument you're rebutting. I was only responding to this claim:

> The argument here is that many (most?) Linux systems have access to a hardware SHA2


Maybe this is a silly question, but why should RNG even be part of the kernel in the first place? It's convenient having it in a device file, but why couldn't that be provided by some userspace program or daemon?


Two good reasons:

1. The kernel has access to unpredictable events that make good key fodder for the CSPRNG itself, which would be more annoying and less efficient to percolate into userland.

2. The kernel can assure all the processes on the device that they're getting random bits from a known source of unpredictable bits, and refuse to deliver those bits if its state hasn't been sufficiently populated; this has been a recurring source of systems vulnerabilities in programs with userland CSPRNGs.

To that, add the convenience factor of always knowing how to get random bits and not having to set up userland dependencies and ensure they're started in the right order &c.

You should generally use your kernel's CSPRNG in preference to other ones, and break that rule of thumb only if you know exactly what you're doing.


1. The kernel has much more access to sources of indeterminism than a userspace application does. Things like disk seeks, packet jitter, asynchronous interrupts, etc. provide lots of "true" entropy to work with. Userspace programs, on the other hand, have very deterministic execution. In fact, the only way to introduce true indeterminism into a userspace program is to query a kernel-mediated resource (e.g. system call, shared memory mapping, etc.), or to invoke an unprivileged and unpredictable hardware instruction, of which there are very few (e.g. RDRAND on x64, LL/SC on ARM).

2. Userspace programs cannot be made as robust to accidental or malicious failures. Even if you have a userspace RNG daemon that randomly open files or sockets to extract entropy, what happens if that daemon crashes? Or it fails to open a file or socket? Or an exploit gets an RCE into the daemon to read privileged files? By contrast, the kernel is already performing all these operations for userspace processes, so it might as well measure those things and stick the results into its own entropy pool to hand out to other processes on request.


The kernel needs randomness too. If it's done there right once, why duplicate the effort in userspace (and watch all the hilarious ways userspace solutions fail)?

getrandom was motivated by the fact that using device files has many failure modes.


Oh, nice, a good 3rd security reason! A userland CSPRNG is essentially just an additional single point of failure, since you're going to end up depending on secure randomness in the kernel already.

It's a good question!


Not arguing, but genuinely curious - what do kernels need randomness for?


A lot of bits of security rely on some level of non-determinism. Things like TCP initial sequence number generation, where every TCP connection sequence starts with a random number. There have been numerous attacks where the RNG wasn't good enough, so an attacker could determine the TCP sequencing and perform various malicious activities. Additionally, things like in-kernel VPNs, like IPSEC and WireGuard, also need RNGs for their own internal functions. Calling out to userspace for that would be painful and could potentially break in a lot of unexpected ways.


I don't recall exactly but i think TCP retransmission delay and handshake adds random to the backoff, to avoid a thundering herd situation which repeats again if all clients retry at same time.

Assigning free ports to applications that listen on a socket is also random, not sure why, feels like it could be sequential unless you want to deliberately obscure what ports are being used.


Address Space Layout Randomisation (ASLR) for one. ASLR moves around key parts of an executable to make exploitation of certain types of vulnerabilities more difficult or impossible.


The kernel can keep secrets from user space, which is necessary for maintaining a secure RNG state.

The kernel also has the hardware access that is used as entropy sources. If the RNG was in user space the kernel would have to provide some way of securely exporting that entropy to user space. It is simpler and more secure to just export the end result of random numbers through a simple API.

All modern OS have made the same decision of having a kernel-based CSRNG, for the same reasons.


Not a silly question at all. It's because having a reliable and uncompromised source of randomness is essential for cryptographic applications. Having the RNG in user space would make it more vulnerable to attack.


I normally care about reproducible rng results and do try to seed all rngs used. There are lots of applications where randomness is used but you still want repeatable behavior. ML experiments is one situation where randomness is common, but is also intended to be repeatable.

I think python standard library/several of data science library rngs are platform independent.


This is not that kind of RNG; this is the system secure random number generator. Secure random numbers aren't seeded.


I wouldn't say the lack of a seed is the difference between a CSPRNG and a PRNG. That's more the difference between a CSPRNG and a stream cipher, where the "seed" is called a "key" and "IV".

PRNGs (Pseudo Random Number Generators) are predictable. CSPRNGs (Cryptographically Secure Pseudo Random Number Generators) aren't (if they're working). HWRNGs (Hardware Random Number Generators) that aren't debiased via a CSPRNG or similar produce nonuniform output (not suitable for cryptography or most other uses directly). TRNGs (True Random Number Generators) might not exist in this universe (deterministic interpretations of quantum mechanics are consistent with observation), it's safer to assume they don't and avoid the term entirely.


I think you are talking about a psuedo-rng.


Basically every RNG you interact with on general purpose computers is a pseudo random number generator. The kernel RNG being discussed here is a CSPRNG, as was the one it replaced.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: