Every time someone writes something like this, I just wish Google open sourced its Fibers - that thing is sublime. Unfortunately that's not going to happen as the niceties that implementation affords require a modified kernel. I think I saw a presentation describing Fibers a few years back by one of its authors, but it's darn near impossible to find because there's also "Google Fiber". If someone has a link, please share. Fibers sort of opened up my mind to what's possible in this space if you go deeper than people usually go. They're also a source of dissatisfaction with how this turns out when people don't go as deep as the authors of Fibers did.
It's hard to overemphasize just how amazing that fiber implementation really is. It's completely transformed my view of programming concurrency. That it's been described in a public paper but hasn't been able to be open sourced is a damn shame.
Could you describe what makes the Google Fibers so nice?
I'm also really curious why they require modifications to the Linux kernel. My first guess would be stronger integration with the IO model at the syscall boundary (similar to io_uring).
You know how the first time you learned about tcp sockets you made a server that spawned a new thread to handle an incoming connection (or maybe not, people learn differently nowadays).
With the fibers implementation you can just do that. It doesn't kill your performance, and you don't need to go to a painful async model just for performance reasons.
Pretty much. You get to pretend inside your fibers that you're actually running threads. IIRC (it's been a while) you also get proper stack trace when something barfs, the importance of which cannot be overstated.
What are they though? Is this a library for an existing language? A runtime scheduler like the one that does goroutines in Go? If it were open sourced, how would I use it?
It’s just a library that allows easier development of C++ servers in the synchronous, thread-per-requests style, similar to working in Go but a bajillion times better because it’s not in Go.
Fibers should be appreciated in context. They are great at Google because in general the function running on a fiber has no need to ever enter the potentially blocking region. In fibers one has to manually annotate the fact that one is, for example, calling read on a pipe. This works within google because nobody makes system calls from application code and the relevant libraries that need to (mutex) handle it. As a xoogler what I’d like to see opened up is not fibers, which I’d need a new kernel for, but EventManager. EM would be immediately useful to many people.
I don't think that's a huge deal. A preprocessor or a clang plugin (sort of like MOC and moc-ng in QT) could find those system calls and wrap or annotate them appropriately.
What advantages does this have over say, just using pthreads and the Linux kernel to handle the scheduling? I guess you can hit higher performance by not making extra system calls, right? What other benefits would there be?
Marl was originally written for SwiftShader¹ a pure-software implementation of the Vulkan graphics API, which needs to run on desktop and embedded devices, with CPUs ranging from 2 to many cores. SwiftShader executes a number of parallel rasterization tasks which have complex blocking dependencies on one another.
Marl attempts to simplify the problem of running and synchronizing tasks. It shamelessly borrows quite a few concepts from golang, simplifying fan-out, fan-in style problems.
The main reason of why not just use pthreads / std::thread really comes down to blocking tasks. Using regular threads, you either need to know your dependency graph ahead of time to ensure tasks are executed in dependency order (blockless), or you likely end up spinning up many more threads than you actually need to allow things to block and wait on others to complete. SwiftShader has tight requirements on the number of threads we're allowed to create, so Marl was our solution. Marl is even capable of running single-threaded with no code changes to the tasks. It's worth mentioning we also evaluated other scheduler solutions using completion callbacks, but "callback hell" was something we were keen to avoid.
I'm still looking into Marl optimizations, but it is already pretty fast. Fiber switching is notably faster than OS context switching for many of the benchmarks we've done.
It's a lot of things (like tracing, defer etc), but for all practical purposes, the code is simple and straightforward to use. Thank you author.
I wish you could create a version based setJmp/longJmp if instrinsics weren't available (say on a different processor, like AVR). That could really help!
> I wish you could create a version based setJmp/longJmp if instrinsics weren't available (say on a different processor, like AVR). That could really help!
There is an implementation that uses ucontext, which you can enable with defining MARL_FIBERS_USE_UCONTEXT. That said, ucontext is a bit broken on macOS, and I haven't attempted to maintain this codepath, so may be removed in the near future.
Assuming you know a bit of assembly for your platform and the ABI (specifically caller vs callee saved registers), adding new platforms is not too difficult. For example: ppc64 and MIPS64 support were both added by external contributors.
We're always open to contributions, so if you write a nice generic implementation using setJmp/longJmp, I'm sure it would be accepted!
With cooperative multitasking (coroutines) you can get by with saving less CPU state. I do this for my C coroutines package [1] for x86 and x86-64 by only saving the registers required by the calling convention.
Marl does exactly this. The scheduler guarantees that once a task is started on a particular thread, it will only resume on that same thread. This allows us to save a bare minimum of registers and state.
Mostly that and a lack of context switches. Async programming is really a fine grained hand written way to do fibers. Goroutines are basically fibers too and are incredibly cheap.
I’d not heard the term fiber before react reworked their framework around it. Is it a more general thing, and is there a good resource to learn about the theory?
I don't think "fiber" has a universally agreed definition, which makes these things super confusing to read without clarification. It's hard to come up with a good definition and stick to it because people keep coming up with new variations.
An interesting case is a different Google fiber library, not open-sourced but described in a public talk. [1] Real kernel threads but (mostly) scheduled in userspace via new syscalls [2] that suspend the current thread and unsuspend a userspace-chosen other thread. The idea is that most of the overhead of kernel threads is the scheduling and userspace has the ability to make a quick good choice. [3] Thus you can get most of the CPU performance of an async implementation but the relatively easy-to-read callback-free code of a threaded implementation. Anyway, people can disagree on what to call such a combination.
"Fibers" is a term for manually-scheduled threads that was used in Windows at least since Windows 2000. Niche feature, but it was fully documented and all.
https://github.com/abseil/abseil-cpp/issues/5