C needs to die. Pro tip for language designers: require all fields to be initial...

max_k · on March 7, 2022

> require all fields to be initialized any time an object is created

I'm not a fan of such a policy. That usually leads to people zero-initializing everything. For this bug, this would have been correct, but sometimes, there is no good "initial" value, and zero is just another random value like all the 2^32-1 others.

Worse, if you zero-initialize everything, valgrind will be unable to find accesses to uninitialized variables, which hides the bug and makes it harder to find. If I have no good initial value for something, I'd rather leave it uninitialized.

lmm · on March 7, 2022

> I'm not a fan of such a policy. That usually leads to people zero-initializing everything. For this bug, this would have been correct, but sometimes, there is no good "initial" value, and zero is just another random value like all the 2^32-1 others.

So use a language that has an option type, we've only had them for what, 50 years now.

dundarious · on March 7, 2022

I think https://news.ycombinator.com/item?id=30588362 has shown that this wouldn't solve anything for this particular case.

dundarious · on March 7, 2022

Mandatory explicit initialization, plus a feature to explicitly mark memory as having an undefined value, is a great way to approach this problem. You get the benefit in the majority of cases where you have a defined value you just forgot to set and the compiler errors until you set it, and for the "I know it's undefined, I don't have a value for it yet" case you have both mandatory explicit programmer acknowledgement and the opportunity for debug code to detect mistaken reads of this uninitialized memory.

But I think it would be troublesome to use such a hypothetical feature in C if it's only available in some compiler-specific dialect(s), because you need to coerce to any type, so it would be hard to hide to hide behind a macro. What should it expand to on compilers without support? It would probably need lots of variants specific to scalar types, pointer types, etc., or lots of #if blocks, which would be unfortunate.

Zig is a nice language with this feature, and it fits into many of the same use cases as C: https://ziglang.org/documentation/0.9.1/#undefined

dundarious · on March 7, 2022

Actually, https://news.ycombinator.com/item?id=30588362 has convinced me this wouldn't necessarily solve the bug in question either, since it's a bug caused by (quite legitimately) re-using an existing value. Though it would be easy to implement a "free" operation by just writing `undefined`, so it would still help quite a bit, and more than suggestions like "just use an Optional/Maybe type".

gpderetta · on March 7, 2022

GCC has recently introduced a mode (-ftrivial-auto-var-init) that will zero initialize all automatic variables by default while still treating them as UB for sanitize/warning purposes.

The issue is with dynamic memory allocation as that would be the responsibility of the allocator (and of course the kernel uses custom allocators).

max_k · on March 7, 2022

Interesting compiler feature to work around (unknown) vulnerabilities similar to this one. However in this case, it wouldn't help; the initial allocation is with explicit zero-initialization, but this is a circular buffer, and the problem occurs when slots get reused (which is the basic idea of a circular buffer).

abbeyj · on March 7, 2022

Would this get caught by KMSAN (https://github.com/google/kmsan)? Maybe the circular buffer logic would need to get some calls to `__msan_allocated_memory` and/or `__sanitizer_dtor_callback` added to it? If this could be made to work then it would ensure that this bug stays fixed and doesn't regress.

max_k · on March 7, 2022

Yes, but as you said, it works only after adding such annotations to various libraries. A circular buffer is just a special kind of memory allocator, and as such, when it allocates and deallocates memory, it needs to tell the sanitizer about it.

What bothers me about the Linux code base is that there is so much code duplication; the pipe doesn't use a generic circular buffer implementation, but instead rolls its own. If you had the one true implementation, you'd add those annotations there, once, and all users would have it, and would benefit from KMSAN's deep insight.

Every time I hack Linux kernel code, I'm reminded how ugly plain C is, how it forces me to repeat myself (unless you enter macro hell, but Linux is already there). I wish the Linux kernel would agree on a subset of C++, which would allow making it much more robust and simpler.

They recently agreed to allow Rust code in certain tail ends of the code base; that's a good thing, but much more would be gained from allowing that subset of C++ everywhere. (Do both. I'm not arguing against Rust.)

db48x · on March 8, 2022

Some good abstractions would really help with these kinds of allocators.

banana_maker · on March 7, 2022

Why can't things like option types be used? That solves the issue as you'd either have `Some<FooType>` or `None`, which could be dealt with separately.

kevincox · on March 7, 2022

I love Rust, but would it have prevented this problem? IIUC there was no memory corruption at the language level here. This was really just a logic error.

db48x · on March 7, 2022

Yes, it would have. Some code creates an instance of some struct, but doesn’t set the flags field to zero. It thus keeps whatever value happened to be in that spot in memory, an essentially random set of bits. Rust would force you to either explicitly name the flags field and give it a value, or use `..Default::default()` to initialize all remaining fields automatically. Anything else would be a compile–time error.

The fix:

    +++ b/lib/iov_iter.c
    @@ -414,6 +414,7 @@ static size_t copy_page_to_iter_pipe(struct page \*page, size_t offset, size_t by
       return 0;
    
      buf->ops = &page_cache_pipe_buf_ops;
    + buf->flags = 0;
      get_page(page);
      buf->page = page;
      buf->offset = offset;

rfoo · on March 7, 2022

No. Rust can not prevent this bug.

The bug is that they are reusing (or, repurposing) an already-allocated-and-used buffer and forgot to reset flags. This is a logic bug, not a memory safety bug.

In fact, this might be a prime example of "using Rust does not magically eliminate your temporal bugs because sometimes they are not about memory safety but logical". Before that my favorite such bug is a Use-After-Free in Redox OS's filesystem codes.

Pro tip for random HN Rust evangelist: read the fucking code before posting your "sHoUlD HAVe uSED A BeTTER lANGUAGE" shit.

deredede · on March 7, 2022

This is only partially fair. In Rust you would probably have assigned a new object into *buf here instead of overwriting the fields manually. It is good practice to do this (if the code is logically an object initialization, it should actually be an object initialization, not a bunch of field assignments), but it's clunky to do so in C because you can't use initializers in assignments.

TonyTrapp · on March 7, 2022

The point is: You could have done this in Rust, but you wouldn't have been required to do so, so the exact same logic bug could have emerged. Maybe it would be more Rust-like to write the code like that, but it would have also been possible to write the code like that in C - and since we're talking about the kernel here, even if this code was written in Rust a developer might have written it in the more C-like way for performance reasons.

lanstin · on March 7, 2022

People writing C don't re-use allocated objects because it's clunky but to improve performance. The general purpose allocators are almost always much slower than something where you know the pattern of allocations. I've no idea if Rust has a similar issue. I would think that most kernel code, whether C or Rust, would need to handle "allocation fails" case and not depend on language constructs to do allocations, but that's just a guess.

deredede · on March 7, 2022

I'm not saying you shouldn't reuse allocated objects. I'm talking about building a local object (no dynamic allocation) and assigning it to the pointer at once. This has the same runtime behavior (assuming -O1) as assigning the fields one by one.

See https://godbolt.org/z/Wh5KcTaGY for what I'm talking about, the local allocation is easily eliminated by the compiler.

The equivalent in C is to create a temporary local variable with an initializer list then write that variable to the pointer.

dzaima · on March 7, 2022

you can assign a new object into *buf in C just fine, with "*buf = (struct YourType){.prop1 = a, .prop2 = b}"; it even zero-initializes unset fields! So C and Rust give you precisely the same abilities here.

edit: the "struct pipe_buffer" in question[1] has one field that even the updated code doesn't write - "private". Not sure what it's about, but it's there. Not writing unrelated fields like that is probably not much of an issue now, but it certainly can add up on low-power systems. You might also have situations where you'd want to write parts of an object in different parts of the code, for which you'd have to resort to writing fields manually.

[1]: https://github.com/torvalds/linux/blob/719fce7539cd3e186598e...

deredede · on March 7, 2022

Oh I was not aware of this syntax in C, thanks for bringing it up! I still think the pattern is more common and known in Rust but I might be wrong :)

Re: your other points, "reusing a pre-allocated struct from a buffer" is basically object initialization, which is different from other times you want to write fields. In general an object initialization construct should be used in those cases, this whole thread being an argument why. Out-of-band fields such as the "private" field are a pain I agree, but they can be separated from an inner struct (in which case the inner struct is the only field that gets assigned for initialization).

Taking a step back, the true solution is probably to have a proper constructor... And that can be done in any language, so I'll stand corrected.

dzaima · on March 7, 2022

The question of safety is often much more about coding standards than actual language features. For a kernel, you'd be violating Rust's safety features left right and center (or have a slow kernel that just dies on OOM; choose one) and would have to come up with your own standards for preventing coding errors in those, which is what you already have with C.

To be clear, some base level of trivially safe code is certainly a nice-to-have. I just don't think the amount that helps for a kernel is that much, and the added boilerplate on more unsafe things might even obscure issues.

db48x · on March 8, 2022

I think you need to have C99 for that, and the kernel is still using C89. If C needs to die, then C89 needs to die first :)

dzaima · on March 8, 2022

as far as I understand, the kernel uses C89 + GNU extensions, cause there definitely are usages in the kernel of it. (my searching showed it being only used for defining globals, which is a weird standard, but I don't see anything preventing using it elsewhere if they wanted to)

And there are recent plans to move the kernel to a newer C version anyway.

gpderetta · on March 7, 2022

I agree with your sentiment. Only the most strict pure functional languages will prevent you from reusing objects.

You could argue that some languages distinguish raw memory from actual objects and even when reusing memory you would still go through an initialization phase (for example placement new in C++) that would take care of putting the object into a known default state.

db48x · on March 8, 2022

> The bug is that they are reusing (or, repurposing) an already-allocated-and-used buffer and forgot to reset flags. This is a logic bug, not a memory safety bug.

This statement is incorrect. They are using an arena allocator, and there is no way for it to know if it is reusing one of the elements or using that element for the first time. To do this in Rust you would probably be using the MaybeUninit type: https://doc.rust-lang.org/std/mem/union.MaybeUninit.html

However, you are partly correct. In Rust, when using the MaybeUninit type, it is still possible to partially initialize an object and then return it as if it were fully initialized without hitting a compile error. https://doc.rust-lang.org/std/mem/union.MaybeUninit.html#ini...

If you do the whole struct at once, rather than one field at a time, then the compiler still has your back:

    let foo = unsafe {
      let mut uninit: MaybeUninit<Foo> = MaybeUninit::uninit();
      uninit.write(Foo {
        name: "Bob".to_string(),
        list: vec![0, 1, 2]
      })
    }

kevincox · on March 7, 2022

Ah thanks for explaining. I misunderstood the root cause and didn't read the patch. Rust definitely would have helped here. Or even just enforcing modern C practices such as overwriting the whole struct so that non-specified values would have been set to zero (although explicit is better than zero).

amelius · on March 7, 2022

Wouldn't Lint have caught the error too?

modshatereality · on March 7, 2022

buf is a pointer to a pipe_buffer array, this whole function wreaks yes, but i don't think this is a simple "initialization would have fixed" bug

buf = &pipe->bufs[i_head & p_mask];

i think it is a problem with merging and needing to reset the flags, but i didn't want to waste too much time really trying to find the root issue and wtf merging is/why do it.

pdw · on March 7, 2022

You can't accidentally leave a field of a struct uninitialized in Rust or in other sane languages.

max_k · on March 7, 2022

btw. this is how I would make the code more robust: https://lore.kernel.org/lkml/20220225185431.2617232-4-max.ke...

I'm a C++ guy, and the lack of constructors is one of many things that bothers me with C.

wtarreau · on March 9, 2022

I tend to agree but only partially. I'm doing myself a lot of init functions for plenty of stuff, but I've faced plenty of issues due to the fact that these functions do not always initialize everything (and your patch above does exactly that for a good reason) and that it's much less obvious for those using them. And if they initialize too much, they can equally be a pain to work with. It's not uncommon to require 2 or 3 different init functions depending on what you're doing, but the name should explicitly indicate the promise, and that's where it's difficult.

max_k · on March 9, 2022

Then let's have those 2 or 3 documented init functions. That's not perfect, but still much better than spraying different (undocumented) copies of the init code everywhere, that have to be located and adjusted every time somebody refactors something.

wtarreau · on March 9, 2022

In any case, definitely ;-)

turminal · on March 7, 2022

> Pro tip for language designers: require all fields to be initialized any time an object is created.

This proposal sounds great until you find out that this is a hard problem to solve reasonably well in the compiler and no matter what you do there will be valid programs that your compiler will reject.

pjc50 · on March 7, 2022

No? This seems to work perfectly fine for other languages.

> valid programs that your compiler will reject.

Do you mean "valid programs where everything is initialized when an object is created will somehow fail to detect that"?

Arnavion · on March 7, 2022

>Do you mean "valid programs where everything is initialized when an object is created will somehow fail to detect that"?

Valid programs where a field is left uninitialized at creation time, but the programmer makes sure it's initialized before it's used.

db48x · on March 8, 2022

No, I don’t think that this should be considered a valid program. I would never allow any code that creates a partly–initialized struct in any code that I maintain.

I agree that it’s a hard problem, but I don’t think that language designers should use that as an excuse. I think that language designers should really double–down and require that every field be initialized, otherwise people will just forget. The code will be carefully written at first, but then someone new will come along and add a new field, and then the existing code is insufficient. With no errors from the compiler there is no way to ensure that the new programmer updates everything to accommodate the new field.

Rust does pretty well in this regard, but you can still use unsafe blocks to get a partly–initialized struct if you really want one. I like Rust, but I wish that it went all the way and didn’t allow it at all.

pjc50 · on March 7, 2022

Sure, so we can easily change the definition of "valid" so that you have to initialize them at creation time. shrug