The default behavior of cargo is to download stuff from the internet. This may be the least reproducible thing ever.
I'm honestly astonished that programmers of a language that is deemed to be "safe by default" thought that this behavior was acceptable in any form, not to say the default. If downloading things at build time is somehow necessary, it should be an obscure option behind a flag with a scary name, like --extremely-unsafe-i-know-what-i-am-doing, that prompted the user with a small turing test every time that it is run. Cargo is just bonkers, it doesn't matter at all if it is "convenient" or not. Convenience before basic safety and reproducibility is contrary to the spirit of the language itself.
It's as if bounds checking in the language was deferred to a third party that you need to "trust" in order to believe that you won't have segmentation faults.
It doesn't just download random things. Cargo generates a Cargo.lock file with checksums and will make sure that those checksums match when building later on. It's about as safe as vendoring all dependencies while being far easier to work with (though tools like cargo-vendor do exist, of course).
Edit: for things like the kernel, vendoring dependencies is still probably not a bad idea, of course
What prevents a given URL from disappearing? Does that just break a particular source version of the Linux kernel?
What happens when a given dependency adds new kernel-inappropriate features? Are kernel devs going to act like distro maintainers and decide between forking, maintaining patch sets, etc.?
All crate sources are stored in the crates.io package archive, which never deletes packages.
A dependency veering off in a direction you don't like is one of the risks of using someone else's code instead of writing it yourself. Cargo makes it easy to use forked dependencies, and forking a dependency is almost always less work than if you'd never used it and written the code yourself from the beginning. (And to be clear this is only a problem for future evolution; a crate author cannot remove or modify an already-published version of their crate.)
This is still fairly short sighted. Websites shut down, large websites with big storage demands are especially vulnerable to attrition. Who wants to pay the mounting bill for keeping decades of revisions of historical rust packages online?
I can grab the kernel sources from 1997 and build them today. Will I be able to build rust code from 2022 in 2047, because the 1997 kernel will still build at that date.
"I can grab the kernel sources from 1997 and build them today."
Where would you be grabbing it from? ...From a website? "Websites shut down, large websites with big storage demands are especially vulnerable to attrition. Who wants to pay the mounting bill for keeping decades of revisions of historical Linux kernels online?"
You make a copy, store it on your medium if choice, and put it in a filing cabinet. I gather that certain organizations use magnetic tape backups for especially important data. For some organizations and individuals, kernel source code could be that important.
There is a fairly large difference between archiving your own project's history for as long as you feel like, and archiving the complete history of every significant piece of code ever written in a particular programming language forever.
Who claims that archiving the complete history of every significant piece of code ever written in Rust is necessary? It is easy to archive only the code that your project depends upon. Rust code is no different from C code in this regard.
- crates.io is financed by the Rust Foundation and is at no risk of disappearing, it is a very well funded effort.
- Using cargo with an alternative repo is not difficult, requires some one-time configuration.
- Vendoring your dependencies is supported.
- cargo hits the network to look for semver compatible updated versions of your dependencies on specific moments if you don't have a Cargo.lock file.
- Not updating your dependencies stops you from getting the rug pulled from under you if an unwanted change happens, but it also stops you from getting any desired changes including security vulnerability fixes.
- Even if you vendor all of your dependencies, you still have to audit them the first time and every time you update them. Are you? Most aren't. Code you haven't written yourself can't be assured not to be malicious, and code you've written yourself can still have exploitable mistakes.
It's easy enough to keep your own website up as long as you want to, the liability is other projects and services, especially when the scope of those services is "archive everything for everyone forever".
So your argument is you think the people who run the crates site don't want to do a good job but the people running kernel.org do? What info are you basing this random-seeming decision on? Do you have any actual data suggesting that the crates site will just disappear like you say?
I'd like to see that data if so -- I have pretty big doubts that your statement has merit without some sort of evidence.
As I said in a parallel comment, there is a fairly large difference between archiving your own project's history for as long as you feel like, and archiving the complete history of every significant piece of code ever written in a particular programming language forever.
Kernel.org's repository is also of major versions, not every minor release and patch. That really wouldn't do for cargo. If it has ever been released, it needs to be kept in storage for as long as the rust ecosystem exists. That's decades, maybe even centuries of passing on the torch and hoping the next guy accepts the responsibility. Hoping you can find a next guy.
> I can grab the kernel sources from 1997 and build them today.
Can you? Do they still compiler with current compiler? You'll probably need to find a compiler of that time... And also all the interpreter for all the build scripts. Was that using bash or some old Perl? Maybe something more esoteric like m4 or tcl?
The point is that it always had many external dependencies to bootstrap. And adding one is not such a big deal, it just add another thing to archive among the many other things. The crates.io archive is probably not even that big.
I'm not sure why that would be a problem given most of these languages and standards are older than the Linux kernel. The thing about mature technology is specifically that it doesn't have breaking changes every couple of months. This is the way it used to be for a fairly long time.
But even if it has broken, I can just download an old linux distro. They effectively form a cohesive snapshot of the state of the toolchain whenever they were assembled. Slackware 3.1 from 1996 might be appropriate.
> But even if it has broken, I can just download an old linux distro. They effectively form a cohesive snapshot of the state of the toolchain whenever they were assembled. Slackware 3.1 from 1996 might be appropriate.
You will also need era-appropriate hardware to get that software to install.
I'd rather comment than downvote. Who cares about about a kernel build from 1997 (25 years ago)? What was the hardware back then, Pentium 2? Sorry for the snark in advance but: Why make mountains out of molehills? Life is hard enough as it is.
You may not own a Pentium 2, but someone might. This is only hard if you make it hard. My that an old Linux kernel, by design, can be built today. This is a feature it has for free, a consequence of not relying on flimsy network based dependency managers.
At any rate, we are indebted to the future to preserve the present, as our past has been preserved for our benefit.
"Never" is a long time, just saying. It'll be impossible to beat the "availability" guarantees of a local mirror (like a thumb drive) of a kernel source tarball.
What happens when a crate version has to be removed due to a critical CVE or court order (IP Law violation, perhaps)? There may come a day where crates.io becomes torn between not breaking Linux source and not hosting actively bad source code.
Note that some of those concerns do apply to vendoring source as well, but the additional download step also removes options that the kernel maintainers have as long as they ship all the source for the kernel in one tarball. Like more control over the timing of inevitable decisions.
> What happens when a crate version has to be removed due to a critical CVE or court order (IP Law violation, perhaps)?
CVE = The Yank flag. Cargo will refuse to add new yanked packages to a lock file, but if a yanked package is already in the lock file, it will still build. The package is not actually deleted. https://doc.rust-lang.org/cargo/commands/cargo-yank.html
Legal = Hard delete. Nobody will go to jail just to avoid breaking your build. Of course, since crates.io and kernel.org are in the same legal jurisdiction, is there any actual difference here?
What happens today when a kernel module has to be removed due to a critical CVE or court order?
That's not just a rhetorical flourish, I'm actually curious what the answer is. As far as I know, (1) it almost never happens and (2) when it does, the change is made in upstream repos and as a practical matter, everyone downloads those changes and their up-to-date local copies lose that code.
Fixing it in the future isn't the point. Breaking previous releases is.
The previous tarballs still work and contain the relevant code. Your build wouldn't rely on hosts complying with court orders in countries you might not live in.
If the code isn't vendored, just referenced with URLs, the old tarballs stop working.
This hypothetical court-order situation is quite far-fetched. If crates.io was ordered to take down some or all versions of a package, an alternative mirror could easily be created elsewhere and you could configure cargo to use it.
But I think the kernel would vendor crate dependencies, partly so that people can build without accessing the network, simply because that's policy in many places.
To the first question, obviously the sources of dependencies would be brought into the tree. This is easy and there's no reason I'm aware of not to do it for something like the Linux kernel.
To the second set of questions, how is this any different than any other dependency the kernel has? If the answer is "the kernel has no dependencies" then yeah, I'm very sympathetic to the argument that bringing in rust libraries is not a good reason to start having dependencies when none previously existed at all, but is that the case?
You're forgetting about custom build scripts. Thankfully most of the core ones have moved off cloning dependencies for ffi purposes (think cloning an alsa-lib version for ffi), but it used to be super common.
No, it is. Even without `--locked`, the Cargo.lock file is only updated when it no longer fulfills the Cargo.toml because the latter was edited (and then only making the minimal changes necessary), or explicitly using `cargo update`.
Yes, it's always read. If the file didn't require updating, a build with and without `--locked` will be identical. If it did require updating, `--locked` will make cargo exit with an error.
That's true when running `cargo install` to install an application directly from crates.io, but not when running `cargo build` in an already checked-out repository.
A cargo build ends up there calling into the resolver’s resolve_ws_with_opts() which would refresh the lockfile.
Not resolve_with_previous() which would use the lock file as-is.
The only reason this sticks in my mind is i ran into an issue building bat after i made some changes, i obviously assumed it was my changes so went through the process of debugging and backing out my changes until finally i was back to a virgin branch and still failing - passing —frozen —locked fixed it.
If your project has a Cargo.lock file checked into its repo, then everyone checking that out will download the same code for all dependencies (unless someone manages to compromise the crates.io package archive). That is very far from "the least reproducible thing ever".
> The default behavior of cargo is to download stuff from the internet.
This is borderline inevitable for most modern development stacks, though .lock files can definitely help, even adding hashes to check against if you care about your dependencies being the same as when you first download/add them to the project and/or inspect the code.
As for worries about the things in those URLs disappearing, in most cases you should be using a proxy repository of some sort, which i've seen leveraged often in enterprise environments - something like JFrog Artifactory or Sonatype Nexus, with repositories either globally, or on a per-project basis.
The problem here is that all of these repositories kind of suck and that the ecosystem around them also does:
- for example, Nexus routinely fails to remove all of the proxied container images and their blobs that are older than a certain date, bloating disk space usage
- when proxying npm, Nexus needs additional reverse proxy configuration, since URL encoded slashes aren't typically allowed
- many popular formats, like Composer (or plenty more niche ones) are only community supported https://help.sonatype.com/repomanager3/nexus-repository-administration/formats (nobody will ever cover *all* of the formats you need, unless you limit yourself to very popular stacks)
- many of the tech stacks that have .lock files may also include URLs to the registry/repository from which they're acquired, so some patching might be necessary
- in technologies like Ruby, actually setting up the proxy isn't as easy as running something like "bundle install --registry=..." as it is in npm
- in other technologies, like Java, you get into the whole SNAPSHOT vs RELEASE issue and even setting up publishing your own packages to something like Nexus can be a bit of work; the lack of proper code libraries for reuse and abundance of code being copy-pasted that i've been being a proof of this in my mind
Of course, i'm mentioning various tech stacks here and i don't doubt that in the long term Rust and other technologies might also address their own individual shortcomings, but my point is that dependency management is just a hard problem in general.
So, for most people the approach that they'll take is to just install stuff from the Internet that other people trust and just hope that the toolchain works as expected, a black box of sorts. I've seen plenty of people just adding packages without auditing 100% of the source code which seems like the inevitable reality when you're just trying to build some software with time/resource constraints.
Downloading C++ dependencies during the build process is equally unacceptable for many situations. Existing C++ build systems and package managers can be configured to do that and those build systems and package managers would be inappropriate for supporting a kernel that values stability and long term support.
So it's a good thing that cargo can be used without downloading dependencies during the build! Just clone the repos of the dependencies (and transitive dependencies), just like you would for a C++ project. Then set up your cargo file to point at the location for your local copy instead of using the default download behavior.
There's even a tool called cargo-vendor that does this for you!
I'm honestly astonished that programmers of a language that is deemed to be "safe by default" thought that this behavior was acceptable in any form, not to say the default. If downloading things at build time is somehow necessary, it should be an obscure option behind a flag with a scary name, like --extremely-unsafe-i-know-what-i-am-doing, that prompted the user with a small turing test every time that it is run. Cargo is just bonkers, it doesn't matter at all if it is "convenient" or not. Convenience before basic safety and reproducibility is contrary to the spirit of the language itself.
It's as if bounds checking in the language was deferred to a third party that you need to "trust" in order to believe that you won't have segmentation faults.