> require all fields to be initialized any time an object is created
I'm not a fan of such a policy. That usually leads to people zero-initializing everything. For this bug, this would have been correct, but sometimes, there is no good "initial" value, and zero is just another random value like all the 2^32-1 others.
Worse, if you zero-initialize everything, valgrind will be unable to find accesses to uninitialized variables, which hides the bug and makes it harder to find. If I have no good initial value for something, I'd rather leave it uninitialized.
> I'm not a fan of such a policy. That usually leads to people zero-initializing everything. For this bug, this would have been correct, but sometimes, there is no good "initial" value, and zero is just another random value like all the 2^32-1 others.
So use a language that has an option type, we've only had them for what, 50 years now.
Mandatory explicit initialization, plus a feature to explicitly mark memory as having an undefined value, is a great way to approach this problem. You get the benefit in the majority of cases where you have a defined value you just forgot to set and the compiler errors until you set it, and for the "I know it's undefined, I don't have a value for it yet" case you have both mandatory explicit programmer acknowledgement and the opportunity for debug code to detect mistaken reads of this uninitialized memory.
But I think it would be troublesome to use such a hypothetical feature in C if it's only available in some compiler-specific dialect(s), because you need to coerce to any type, so it would be hard to hide to hide behind a macro. What should it expand to on compilers without support? It would probably need lots of variants specific to scalar types, pointer types, etc., or lots of #if blocks, which would be unfortunate.
Actually, https://news.ycombinator.com/item?id=30588362 has convinced me this wouldn't necessarily solve the bug in question either, since it's a bug caused by (quite legitimately) re-using an existing value. Though it would be easy to implement a "free" operation by just writing `undefined`, so it would still help quite a bit, and more than suggestions like "just use an Optional/Maybe type".
GCC has recently introduced a mode (-ftrivial-auto-var-init) that will zero initialize all automatic variables by default while still treating them as UB for sanitize/warning purposes.
The issue is with dynamic memory allocation as that would be the responsibility of the allocator (and of course the kernel uses custom allocators).
Interesting compiler feature to work around (unknown) vulnerabilities similar to this one. However in this case, it wouldn't help; the initial allocation is with explicit zero-initialization, but this is a circular buffer, and the problem occurs when slots get reused (which is the basic idea of a circular buffer).
Would this get caught by KMSAN (https://github.com/google/kmsan)? Maybe the circular buffer logic would need to get some calls to `__msan_allocated_memory` and/or `__sanitizer_dtor_callback` added to it? If this could be made to work then it would ensure that this bug stays fixed and doesn't regress.
Yes, but as you said, it works only after adding such annotations to various libraries. A circular buffer is just a special kind of memory allocator, and as such, when it allocates and deallocates memory, it needs to tell the sanitizer about it.
What bothers me about the Linux code base is that there is so much code duplication; the pipe doesn't use a generic circular buffer implementation, but instead rolls its own. If you had the one true implementation, you'd add those annotations there, once, and all users would have it, and would benefit from KMSAN's deep insight.
Every time I hack Linux kernel code, I'm reminded how ugly plain C is, how it forces me to repeat myself (unless you enter macro hell, but Linux is already there). I wish the Linux kernel would agree on a subset of C++, which would allow making it much more robust and simpler.
They recently agreed to allow Rust code in certain tail ends of the code base; that's a good thing, but much more would be gained from allowing that subset of C++ everywhere. (Do both. I'm not arguing against Rust.)
Why can't things like option types be used? That solves the issue as you'd either have `Some<FooType>` or `None`, which could be dealt with separately.
I love Rust, but would it have prevented this problem? IIUC there was no memory corruption at the language level here. This was really just a logic error.
Yes, it would have. Some code creates an instance of some struct, but doesn’t set the flags field to zero. It thus keeps whatever value happened to be in that spot in memory, an essentially random set of bits. Rust would force you to either explicitly name the flags field and give it a value, or use `..Default::default()` to initialize all remaining fields automatically. Anything else would be a compile–time error.
The bug is that they are reusing (or, repurposing) an already-allocated-and-used buffer and forgot to reset flags. This is a logic bug, not a memory safety bug.
In fact, this might be a prime example of "using Rust does not magically eliminate your temporal bugs because sometimes they are not about memory safety but logical". Before that my favorite such bug is a Use-After-Free in Redox OS's filesystem codes.
Pro tip for random HN Rust evangelist: read the fucking code before posting your "sHoUlD HAVe uSED A BeTTER lANGUAGE" shit.
This is only partially fair. In Rust you would probably have assigned a new object into *buf here instead of overwriting the fields manually. It is good practice to do this (if the code is logically an object initialization, it should actually be an object initialization, not a bunch of field assignments), but it's clunky to do so in C because you can't use initializers in assignments.
The point is: You could have done this in Rust, but you wouldn't have been required to do so, so the exact same logic bug could have emerged. Maybe it would be more Rust-like to write the code like that, but it would have also been possible to write the code like that in C - and since we're talking about the kernel here, even if this code was written in Rust a developer might have written it in the more C-like way for performance reasons.
People writing C don't re-use allocated objects because it's clunky but to improve performance. The general purpose allocators are almost always much slower than something where you know the pattern of allocations. I've no idea if Rust has a similar issue. I would think that most kernel code, whether C or Rust, would need to handle "allocation fails" case and not depend on language constructs to do allocations, but that's just a guess.
I'm not saying you shouldn't reuse allocated objects. I'm talking about building a local object (no dynamic allocation) and assigning it to the pointer at once. This has the same runtime behavior (assuming -O1) as assigning the fields one by one.
you can assign a new object into *buf in C just fine, with "*buf = (struct YourType){.prop1 = a, .prop2 = b}"; it even zero-initializes unset fields! So C and Rust give you precisely the same abilities here.
edit: the "struct pipe_buffer" in question[1] has one field that even the updated code doesn't write - "private". Not sure what it's about, but it's there. Not writing unrelated fields like that is probably not much of an issue now, but it certainly can add up on low-power systems. You might also have situations where you'd want to write parts of an object in different parts of the code, for which you'd have to resort to writing fields manually.
Oh I was not aware of this syntax in C, thanks for bringing it up! I still think the pattern is more common and known in Rust but I might be wrong :)
Re: your other points, "reusing a pre-allocated struct from a buffer" is basically object initialization, which is different from other times you want to write fields. In general an object initialization construct should be used in those cases, this whole thread being an argument why. Out-of-band fields such as the "private" field are a pain I agree, but they can be separated from an inner struct (in which case the inner struct is the only field that gets assigned for initialization).
Taking a step back, the true solution is probably to have a proper constructor... And that can be done in any language, so I'll stand corrected.
The question of safety is often much more about coding standards than actual language features. For a kernel, you'd be violating Rust's safety features left right and center (or have a slow kernel that just dies on OOM; choose one) and would have to come up with your own standards for preventing coding errors in those, which is what you already have with C.
To be clear, some base level of trivially safe code is certainly a nice-to-have. I just don't think the amount that helps for a kernel is that much, and the added boilerplate on more unsafe things might even obscure issues.
as far as I understand, the kernel uses C89 + GNU extensions, cause there definitely are usages in the kernel of it. (my searching showed it being only used for defining globals, which is a weird standard, but I don't see anything preventing using it elsewhere if they wanted to)
And there are recent plans to move the kernel to a newer C version anyway.
I agree with your sentiment. Only the most strict pure functional languages will prevent you from reusing objects.
You could argue that some languages distinguish raw memory from actual objects and even when reusing memory you would still go through an initialization phase (for example placement new in C++) that would take care of putting the object into a known default state.
> The bug is that they are reusing (or, repurposing) an already-allocated-and-used buffer and forgot to reset flags. This is a logic bug, not a memory safety bug.
This statement is incorrect. They are using an arena allocator, and there is no way for it to know if it is reusing one of the elements or using that element for the first time. To do this in Rust you would probably be using the MaybeUninit type: https://doc.rust-lang.org/std/mem/union.MaybeUninit.html
However, you are partly correct. In Rust, when using the MaybeUninit type, it is still possible to partially initialize an object and then return it as if it were fully initialized without hitting a compile error. https://doc.rust-lang.org/std/mem/union.MaybeUninit.html#ini...
If you do the whole struct at once, rather than one field at a time, then the compiler still has your back:
Ah thanks for explaining. I misunderstood the root cause and didn't read the patch. Rust definitely would have helped here. Or even just enforcing modern C practices such as overwriting the whole struct so that non-specified values would have been set to zero (although explicit is better than zero).
buf is a pointer to a pipe_buffer array, this whole function wreaks yes, but i don't think this is a simple "initialization would have fixed" bug
buf = &pipe->bufs[i_head & p_mask];
i think it is a problem with merging and needing to reset the flags, but i didn't want to waste too much time really trying to find the root issue and wtf merging is/why do it.
I tend to agree but only partially. I'm doing myself a lot of init functions for plenty of stuff, but I've faced plenty of issues due to the fact that these functions do not always initialize everything (and your patch above does exactly that for a good reason) and that it's much less obvious for those using them. And if they initialize too much, they can equally be a pain to work with. It's not uncommon to require 2 or 3 different init functions depending on what you're doing, but the name should explicitly indicate the promise, and that's where it's difficult.
Then let's have those 2 or 3 documented init functions. That's not perfect, but still much better than spraying different (undocumented) copies of the init code everywhere, that have to be located and adjusted every time somebody refactors something.
> Pro tip for language designers: require all fields to be initialized any time an object is created.
This proposal sounds great until you find out that this is a hard problem to solve reasonably well in the compiler and no matter what you do there will be valid programs that your compiler will reject.
No, I don’t think that this should be considered a valid program. I would never allow any code that creates a partly–initialized struct in any code that I maintain.
I agree that it’s a hard problem, but I don’t think that language designers should use that as an excuse. I think that language designers should really double–down and require that every field be initialized, otherwise people will just forget. The code will be carefully written at first, but then someone new will come along and add a new field, and then the existing code is insufficient. With no errors from the compiler there is no way to ensure that the new programmer updates everything to accommodate the new field.
Rust does pretty well in this regard, but you can still use unsafe blocks to get a partly–initialized struct if you really want one. I like Rust, but I wish that it went all the way and didn’t allow it at all.
Really impressive debugging too.