The default behavior of cargo is to download stuff from the internet. This may b...

thecrm · on April 15, 2022

It doesn't just download random things. Cargo generates a Cargo.lock file with checksums and will make sure that those checksums match when building later on. It's about as safe as vendoring all dependencies while being far easier to work with (though tools like cargo-vendor do exist, of course).

Edit: for things like the kernel, vendoring dependencies is still probably not a bad idea, of course

humanrebar · on April 15, 2022

What prevents a given URL from disappearing? Does that just break a particular source version of the Linux kernel?

What happens when a given dependency adds new kernel-inappropriate features? Are kernel devs going to act like distro maintainers and decide between forking, maintaining patch sets, etc.?

roca · on April 15, 2022

All crate sources are stored in the crates.io package archive, which never deletes packages.

A dependency veering off in a direction you don't like is one of the risks of using someone else's code instead of writing it yourself. Cargo makes it easy to use forked dependencies, and forking a dependency is almost always less work than if you'd never used it and written the code yourself from the beginning. (And to be clear this is only a problem for future evolution; a crate author cannot remove or modify an already-published version of their crate.)

marginalia_nu · on April 15, 2022

This is still fairly short sighted. Websites shut down, large websites with big storage demands are especially vulnerable to attrition. Who wants to pay the mounting bill for keeping decades of revisions of historical rust packages online?

I can grab the kernel sources from 1997 and build them today. Will I be able to build rust code from 2022 in 2047, because the 1997 kernel will still build at that date.

pdw · on April 15, 2022

"I can grab the kernel sources from 1997 and build them today."

Where would you be grabbing it from? ...From a website? "Websites shut down, large websites with big storage demands are especially vulnerable to attrition. Who wants to pay the mounting bill for keeping decades of revisions of historical Linux kernels online?"

humanrebar · on April 15, 2022

You make a copy, store it on your medium if choice, and put it in a filing cabinet. I gather that certain organizations use magnetic tape backups for especially important data. For some organizations and individuals, kernel source code could be that important.

kibwen · on April 15, 2022

You can easily do this with Rust and Cargo as well.

marginalia_nu · on April 15, 2022

There is a fairly large difference between archiving your own project's history for as long as you feel like, and archiving the complete history of every significant piece of code ever written in a particular programming language forever.

kibwen · on April 15, 2022

Who claims that archiving the complete history of every significant piece of code ever written in Rust is necessary? It is easy to archive only the code that your project depends upon. Rust code is no different from C code in this regard.

estebank · on April 15, 2022

A couple of things addressing points from different part in this thread:

- Archiving the complete history of all crates in crates.io is perfectly feasible today for an individual. Over time that might change.

- Setting up a mirror is straightforward, should you want to do so: https://github.com/rust-lang/crates.io/blob/master/docs/MIRR...

- crates.io is financed by the Rust Foundation and is at no risk of disappearing, it is a very well funded effort.

- Using cargo with an alternative repo is not difficult, requires some one-time configuration.

- Vendoring your dependencies is supported.

- cargo hits the network to look for semver compatible updated versions of your dependencies on specific moments if you don't have a Cargo.lock file.

- Not updating your dependencies stops you from getting the rug pulled from under you if an unwanted change happens, but it also stops you from getting any desired changes including security vulnerability fixes.

- Even if you vendor all of your dependencies, you still have to audit them the first time and every time you update them. Are you? Most aren't. Code you haven't written yourself can't be assured not to be malicious, and code you've written yourself can still have exploitable mistakes.

marginalia_nu · on April 15, 2022

From <https://mirrors.edge.kernel.org/pub/linux/kernel/>?

It's easy enough to keep your own website up as long as you want to, the liability is other projects and services, especially when the scope of those services is "archive everything for everyone forever".

sophacles · on April 15, 2022

So your argument is you think the people who run the crates site don't want to do a good job but the people running kernel.org do? What info are you basing this random-seeming decision on? Do you have any actual data suggesting that the crates site will just disappear like you say?

I'd like to see that data if so -- I have pretty big doubts that your statement has merit without some sort of evidence.

marginalia_nu · on April 15, 2022

As I said in a parallel comment, there is a fairly large difference between archiving your own project's history for as long as you feel like, and archiving the complete history of every significant piece of code ever written in a particular programming language forever.

Kernel.org's repository is also of major versions, not every minor release and patch. That really wouldn't do for cargo. If it has ever been released, it needs to be kept in storage for as long as the rust ecosystem exists. That's decades, maybe even centuries of passing on the torch and hoping the next guy accepts the responsibility. Hoping you can find a next guy.

sophacles · on April 15, 2022

So vendor in the dependencies. it's a matter of cloning the dependencies repo and adding a handful of characters to your project's cargo file.

Now the lifetime of the dependency is that of your project. There's even tooling 'cargo-vendor' to help manage this setup.

Alternative of course is implementing it all yourself, which cargo doesn't prevent.

varajelle · on April 15, 2022

> I can grab the kernel sources from 1997 and build them today.

Can you? Do they still compiler with current compiler? You'll probably need to find a compiler of that time... And also all the interpreter for all the build scripts. Was that using bash or some old Perl? Maybe something more esoteric like m4 or tcl?

The point is that it always had many external dependencies to bootstrap. And adding one is not such a big deal, it just add another thing to archive among the many other things. The crates.io archive is probably not even that big.

marginalia_nu · on April 15, 2022

I'm not sure why that would be a problem given most of these languages and standards are older than the Linux kernel. The thing about mature technology is specifically that it doesn't have breaking changes every couple of months. This is the way it used to be for a fairly long time.

But even if it has broken, I can just download an old linux distro. They effectively form a cohesive snapshot of the state of the toolchain whenever they were assembled. Slackware 3.1 from 1996 might be appropriate.

estebank · on April 15, 2022

> But even if it has broken, I can just download an old linux distro. They effectively form a cohesive snapshot of the state of the toolchain whenever they were assembled. Slackware 3.1 from 1996 might be appropriate.

You will also need era-appropriate hardware to get that software to install.

scns · on April 15, 2022

I'd rather comment than downvote. Who cares about about a kernel build from 1997 (25 years ago)? What was the hardware back then, Pentium 2? Sorry for the snark in advance but: Why make mountains out of molehills? Life is hard enough as it is.

marginalia_nu · on April 15, 2022

You may not own a Pentium 2, but someone might. This is only hard if you make it hard. My that an old Linux kernel, by design, can be built today. This is a feature it has for free, a consequence of not relying on flimsy network based dependency managers.

At any rate, we are indebted to the future to preserve the present, as our past has been preserved for our benefit.

Gwypaas · on April 15, 2022

I would imagine the kernel would use cargo vendor, or similar, to lock all dependencies into their chosen source control and quality requirements.

https://doc.rust-lang.org/cargo/commands/cargo-vendor.html

humanrebar · on April 15, 2022

"Never" is a long time, just saying. It'll be impossible to beat the "availability" guarantees of a local mirror (like a thumb drive) of a kernel source tarball.

What happens when a crate version has to be removed due to a critical CVE or court order (IP Law violation, perhaps)? There may come a day where crates.io becomes torn between not breaking Linux source and not hosting actively bad source code.

Note that some of those concerns do apply to vendoring source as well, but the additional download step also removes options that the kernel maintainers have as long as they ship all the source for the kernel in one tarball. Like more control over the timing of inevitable decisions.

notriddle · on April 15, 2022

> What happens when a crate version has to be removed due to a critical CVE or court order (IP Law violation, perhaps)?

CVE = The Yank flag. Cargo will refuse to add new yanked packages to a lock file, but if a yanked package is already in the lock file, it will still build. The package is not actually deleted. https://doc.rust-lang.org/cargo/commands/cargo-yank.html

Legal = Hard delete. Nobody will go to jail just to avoid breaking your build. Of course, since crates.io and kernel.org are in the same legal jurisdiction, is there any actual difference here?

mcherm · on April 15, 2022

What happens today when a kernel module has to be removed due to a critical CVE or court order?

That's not just a rhetorical flourish, I'm actually curious what the answer is. As far as I know, (1) it almost never happens and (2) when it does, the change is made in upstream repos and as a practical matter, everyone downloads those changes and their up-to-date local copies lose that code.

humanrebar · on April 15, 2022

Fixing it in the future isn't the point. Breaking previous releases is.

The previous tarballs still work and contain the relevant code. Your build wouldn't rely on hosts complying with court orders in countries you might not live in.

If the code isn't vendored, just referenced with URLs, the old tarballs stop working.

roca · on April 19, 2022

This hypothetical court-order situation is quite far-fetched. If crates.io was ordered to take down some or all versions of a package, an alternative mirror could easily be created elsewhere and you could configure cargo to use it.

But I think the kernel would vendor crate dependencies, partly so that people can build without accessing the network, simply because that's policy in many places.

3836293648 · on April 15, 2022

Does crates.io actively host any code? I thought it was all just readmes and links to github and docs.rs

conradludgate · on April 15, 2022

They do host it. It's registry info is mirrored on github though

sanderjd · on April 15, 2022

To the first question, obviously the sources of dependencies would be brought into the tree. This is easy and there's no reason I'm aware of not to do it for something like the Linux kernel.

To the second set of questions, how is this any different than any other dependency the kernel has? If the answer is "the kernel has no dependencies" then yeah, I'm very sympathetic to the argument that bringing in rust libraries is not a good reason to start having dependencies when none previously existed at all, but is that the case?

yw3410 · on April 15, 2022

You're forgetting about custom build scripts. Thankfully most of the core ones have moved off cloning dependencies for ffi purposes (think cloning an alsa-lib version for ffi), but it used to be super common.

CraigJPerry · on April 15, 2022

The lock file is created but is not used by default.

You must specify --locked to get that behaviour

heftig · on April 15, 2022

No, it is. Even without `--locked`, the Cargo.lock file is only updated when it no longer fulfills the Cargo.toml because the latter was edited (and then only making the minimal changes necessary), or explicitly using `cargo update`.

CraigJPerry · on April 15, 2022

I don’t follow - I’m saying the cargo.lock isn’t read unless you specify —locked - I’m not talking about when it gets refreshed?

heftig · on April 15, 2022

Yes, it's always read. If the file didn't require updating, a build with and without `--locked` will be identical. If it did require updating, `--locked` will make cargo exit with an error.

CraigJPerry · on April 15, 2022

Gotcha, makes sense

duckerude · on April 15, 2022

That's true when running `cargo install` to install an application directly from crates.io, but not when running `cargo build` in an already checked-out repository.

CraigJPerry · on April 15, 2022

I might be misreading this on an iphone screen but as i follow the logic here: https://github.com/rust-lang/cargo/blob/a77ed9ba87bfeaf3c275...

A cargo build ends up there calling into the resolver’s resolve_ws_with_opts() which would refresh the lockfile.

Not resolve_with_previous() which would use the lock file as-is.

The only reason this sticks in my mind is i ran into an issue building bat after i made some changes, i obviously assumed it was my changes so went through the process of debugging and backing out my changes until finally i was back to a virgin branch and still failing - passing —frozen —locked fixed it.

goodpoint · on April 15, 2022

> It doesn't just download random things.

That's exactly what it does. The developer is not really expected to thoroughly review the codebase of every dependency.

Just like javascript, all sort of supply chain attacks are made possible.

A single malicious library can sneak into large ecosystems easily.

roca · on April 15, 2022

If your project has a Cargo.lock file checked into its repo, then everyone checking that out will download the same code for all dependencies (unless someone manages to compromise the crates.io package archive). That is very far from "the least reproducible thing ever".

microtonal · on April 15, 2022

Cargo.lock also contains crate hashes. So, if someone compromises crates.io and tampers with a crate, you would notice.

easytiger · on April 15, 2022

> The default behavior of cargo is to download stuff from the internet. This may be the least reproducible thing ever.

Wait till you find out about java ecosystems

I know investment bank dev teams pulling whatever they need from maven central with no oversight or introspection.

DrBazza · on April 15, 2022

Log4j joins the conversation.

easytiger · on April 15, 2022

yea i'm glad i wasn't around when that was discovered.

KronisLV · on April 15, 2022

> The default behavior of cargo is to download stuff from the internet.

This is borderline inevitable for most modern development stacks, though .lock files can definitely help, even adding hashes to check against if you care about your dependencies being the same as when you first download/add them to the project and/or inspect the code.

As for worries about the things in those URLs disappearing, in most cases you should be using a proxy repository of some sort, which i've seen leveraged often in enterprise environments - something like JFrog Artifactory or Sonatype Nexus, with repositories either globally, or on a per-project basis.

The problem here is that all of these repositories kind of suck and that the ecosystem around them also does:

  - for example, Nexus routinely fails to remove all of the proxied container images and their blobs that are older than a certain date, bloating disk space usage
  - when proxying npm, Nexus needs additional reverse proxy configuration, since URL encoded slashes aren't typically allowed
  - many popular formats, like Composer (or plenty more niche ones) are only community supported https://help.sonatype.com/repomanager3/nexus-repository-administration/formats (nobody will ever cover *all* of the formats you need, unless you limit yourself to very popular stacks)
  - many of the tech stacks that have .lock files may also include URLs to the registry/repository from which they're acquired, so some patching might be necessary
  - in technologies like Ruby, actually setting up the proxy isn't as easy as running something like "bundle install --registry=..." as it is in npm
  - in other technologies, like Java, you get into the whole SNAPSHOT vs RELEASE issue and even setting up publishing your own packages to something like Nexus can be a bit of work; the lack of proper code libraries for reuse and abundance of code being copy-pasted that i've been being a proof of this in my mind

Of course, i'm mentioning various tech stacks here and i don't doubt that in the long term Rust and other technologies might also address their own individual shortcomings, but my point is that dependency management is just a hard problem in general.

So, for most people the approach that they'll take is to just install stuff from the Internet that other people trust and just hope that the toolchain works as expected, a black box of sorts. I've seen plenty of people just adding packages without auditing 100% of the source code which seems like the inevitable reality when you're just trying to build some software with time/resource constraints.

selfmodruntime · on April 15, 2022

I'd really like to know where you think C++ dependencies and headers come from.

humanrebar · on April 15, 2022

Downloading C++ dependencies during the build process is equally unacceptable for many situations. Existing C++ build systems and package managers can be configured to do that and those build systems and package managers would be inappropriate for supporting a kernel that values stability and long term support.

sophacles · on April 15, 2022

So it's a good thing that cargo can be used without downloading dependencies during the build! Just clone the repos of the dependencies (and transitive dependencies), just like you would for a C++ project. Then set up your cargo file to point at the location for your local copy instead of using the default download behavior.

There's even a tool called cargo-vendor that does this for you!

humanrebar · on April 15, 2022

I was just saying anything folks might find objectionable about cargo workflows happens in C and C++ workflows, just in ad hoc ways.

The difference is that no C or C++ package management features are proposed for incorporation in the Linux kernel SDLC.