A hybrid thread / fiber task scheduler written in C++ 11

m0zg · on Feb 9, 2020

Every time someone writes something like this, I just wish Google open sourced its Fibers - that thing is sublime. Unfortunately that's not going to happen as the niceties that implementation affords require a modified kernel. I think I saw a presentation describing Fibers a few years back by one of its authors, but it's darn near impossible to find because there's also "Google Fiber". If someone has a link, please share. Fibers sort of opened up my mind to what's possible in this space if you go deeper than people usually go. They're also a source of dissatisfaction with how this turns out when people don't go as deep as the authors of Fibers did.

https://github.com/abseil/abseil-cpp/issues/5

Blackthorn · on Feb 9, 2020

It's hard to overemphasize just how amazing that fiber implementation really is. It's completely transformed my view of programming concurrency. That it's been described in a public paper but hasn't been able to be open sourced is a damn shame.

This is the paper: https://www.linuxplumbersconf.org/2013/ocw/system/presentati...

xmo · on Feb 9, 2020

Had a good experience using boost::fiber https://github.com/boostorg/fiber

nwlieb · on Feb 9, 2020

Could you describe what makes the Google Fibers so nice?

I'm also really curious why they require modifications to the Linux kernel. My first guess would be stronger integration with the IO model at the syscall boundary (similar to io_uring).

Edit: is this the talk your referring to? https://www.youtube.com/watch?v=KXuZi9aeGTw

Blackthorn · on Feb 9, 2020

You know how the first time you learned about tcp sockets you made a server that spawned a new thread to handle an incoming connection (or maybe not, people learn differently nowadays).

With the fibers implementation you can just do that. It doesn't kill your performance, and you don't need to go to a painful async model just for performance reasons.

m0zg · on Feb 9, 2020

Pretty much. You get to pretend inside your fibers that you're actually running threads. IIRC (it's been a while) you also get proper stack trace when something barfs, the importance of which cannot be overstated.

ninkendo · on Feb 10, 2020

What are they though? Is this a library for an existing language? A runtime scheduler like the one that does goroutines in Go? If it were open sourced, how would I use it?

thedance · on Feb 10, 2020

It’s just a library that allows easier development of C++ servers in the synchronous, thread-per-requests style, similar to working in Go but a bajillion times better because it’s not in Go.

Blackthorn · on Feb 10, 2020

All of the above, and more -- kernel enhancements. See the linked paper, they detail what they do for the kernel side at least.

dkersten · on Feb 10, 2020

> or maybe not, people learn differently nowadays

If by “nowadays” you mean ~2000 when I first learned socket programming (using select!)? ;-)

m0zg · on Feb 10, 2020

Yes, that's the one. Unfortunately it doesn't show any of the API details that a developer would be exposed to.

thedance · on Feb 9, 2020

Fibers should be appreciated in context. They are great at Google because in general the function running on a fiber has no need to ever enter the potentially blocking region. In fibers one has to manually annotate the fact that one is, for example, calling read on a pipe. This works within google because nobody makes system calls from application code and the relevant libraries that need to (mutex) handle it. As a xoogler what I’d like to see opened up is not fibers, which I’d need a new kernel for, but EventManager. EM would be immediately useful to many people.

Blackthorn · on Feb 10, 2020

I don't think that's a huge deal. A preprocessor or a clang plugin (sort of like MOC and moc-ng in QT) could find those system calls and wrap or annotate them appropriately.

dvirsky · on Feb 9, 2020

This is a Google project though. I haven't dug too deeply but maybe it does share something with the internal fibers? (Which I agree are just amazing)

continuations · on Feb 10, 2020

Facebook's folly has fiber.

Boost also has fiber.

How does Google's Fibers differ from other C++ fiber implementations? What makes it so wonderful?

aninteger · on Feb 9, 2020

What advantages does this have over say, just using pthreads and the Linux kernel to handle the scheduling? I guess you can hit higher performance by not making extra system calls, right? What other benefits would there be?

headlessclayton · on Feb 9, 2020

Author here.

Marl was originally written for SwiftShader¹ a pure-software implementation of the Vulkan graphics API, which needs to run on desktop and embedded devices, with CPUs ranging from 2 to many cores. SwiftShader executes a number of parallel rasterization tasks which have complex blocking dependencies on one another. Marl attempts to simplify the problem of running and synchronizing tasks. It shamelessly borrows quite a few concepts from golang, simplifying fan-out, fan-in style problems.

The main reason of why not just use pthreads / std::thread really comes down to blocking tasks. Using regular threads, you either need to know your dependency graph ahead of time to ensure tasks are executed in dependency order (blockless), or you likely end up spinning up many more threads than you actually need to allow things to block and wait on others to complete. SwiftShader has tight requirements on the number of threads we're allowed to create, so Marl was our solution. Marl is even capable of running single-threaded with no code changes to the tasks. It's worth mentioning we also evaluated other scheduler solutions using completion callbacks, but "callback hell" was something we were keen to avoid.

I'm still looking into Marl optimizations, but it is already pretty fast. Fiber switching is notably faster than OS context switching for many of the benchmarks we've done.

¹ https://github.com/google/swiftshader

vkaku · on Feb 10, 2020

It's a lot of things (like tracing, defer etc), but for all practical purposes, the code is simple and straightforward to use. Thank you author.

I wish you could create a version based setJmp/longJmp if instrinsics weren't available (say on a different processor, like AVR). That could really help!

headlessclayton · on Feb 10, 2020

> I wish you could create a version based setJmp/longJmp if instrinsics weren't available (say on a different processor, like AVR). That could really help!

There is an implementation that uses ucontext, which you can enable with defining MARL_FIBERS_USE_UCONTEXT. That said, ucontext is a bit broken on macOS, and I haven't attempted to maintain this codepath, so may be removed in the near future.

Assuming you know a bit of assembly for your platform and the ABI (specifically caller vs callee saved registers), adding new platforms is not too difficult. For example: ppc64 and MIPS64 support were both added by external contributors.

We're always open to contributions, so if you write a nice generic implementation using setJmp/longJmp, I'm sure it would be accepted!

Cheers!

spc476 · on Feb 10, 2020

With cooperative multitasking (coroutines) you can get by with saving less CPU state. I do this for my C coroutines package [1] for x86 and x86-64 by only saving the registers required by the calling convention.

[1] https://github.com/spc476/C-Coroutines

headlessclayton · on Feb 10, 2020

Marl does exactly this. The scheduler guarantees that once a task is started on a particular thread, it will only resume on that same thread. This allows us to save a bare minimum of registers and state.

Some more info here: https://github.com/google/marl/blob/cbef55d588bc28661bedb822...

api · on Feb 9, 2020

Mostly that and a lack of context switches. Async programming is really a fine grained hand written way to do fibers. Goroutines are basically fibers too and are incredibly cheap.

aidos · on Feb 9, 2020

I’d not heard the term fiber before react reworked their framework around it. Is it a more general thing, and is there a good resource to learn about the theory?

scottlamb · on Feb 9, 2020

I don't think "fiber" has a universally agreed definition, which makes these things super confusing to read without clarification. It's hard to come up with a good definition and stick to it because people keep coming up with new variations.

An interesting case is a different Google fiber library, not open-sourced but described in a public talk. [1] Real kernel threads but (mostly) scheduled in userspace via new syscalls [2] that suspend the current thread and unsuspend a userspace-chosen other thread. The idea is that most of the overhead of kernel threads is the scheduling and userspace has the ability to make a quick good choice. [3] Thus you can get most of the CPU performance of an async implementation but the relatively easy-to-read callback-free code of a threaded implementation. Anyway, people can disagree on what to call such a combination.

[1] https://www.youtube.com/watch?v=KXuZi9aeGTw

[2] the rseq stuff described in that talk has been upstreamed [4] but the switchto stuff still hasn't.

[3] I'm not sure if the performance argument is as strong since the Meltdown/Spectre mitigations added to the performance cost of any syscall.

[4] https://www.efficios.com/blog/2019/02/08/linux-restartable-s...

htfy96 · on Feb 9, 2020

Personally I recommend Distinguishing coroutines and fibers (http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n402...)

In short:

- The key difference between fibers and kernel threads is that fibers use cooperative context switching

- When a coroutine yields, it passes control directly to its caller (or, in the case of symmetric coroutines, a designated other coroutine).

- When a fiber blocks, it implicitly passes control to the fiber scheduler. Coroutines have no scheduler because they need no scheduler

Waterluvian · on Feb 9, 2020

Is that kind of like how gevent for Python works? I thought they were coroutines but there's a scheduler, I think.

icebraining · on Feb 10, 2020

Fibers can be built (and according to the paper, in C++ Boost they are) on top of coroutines, and that's essentially what both gevent and asyncio do.

magicalhippo · on Feb 10, 2020

Not sure about the term, but I know Windows has had them for almost two decades[1], so hardly a newfangled concept.

Here's[2] a fun/interesting 15 year old blog post explaining some of the use cases and pitfalls of using Win32 fibers.

[1]: https://jeffpar.github.io/kbarchive/kb/128/Q128531/ (CreateFiber)

[2]: https://docs.microsoft.com/en-us/archive/blogs/larryosterman...

eps · on Feb 10, 2020

"Fibers" is a term for manually-scheduled threads that was used in Windows at least since Windows 2000. Niche feature, but it was fully documented and all.

cheez · on Feb 10, 2020

I don't particularly like the name defer. It seems to operate similarly to Boost ScopeExit but the name does not imply that.

diath · on Feb 10, 2020

The name is likely inspired by Go's defer statement which works in a similar fashion: https://blog.golang.org/defer-panic-and-recover

cheez · on Feb 10, 2020

It is, but we are in C++

banachtarski · on Feb 10, 2020

C++11 is so 2011...