A Journey into the Linux Scheduler

synergy20 · on July 4, 2022

Great write!

When I started Linux I was confused by the cpu/processor scheduler and IO scheduler, would be even better if the article can point out the difference briefly.

In AI workload these days, how to schedule thousands of parallel threads(SIMD style) becomes more and more interesting, wish someone had a good write on that topic.

pca006132 · on July 4, 2022

By SIMD style do you mean really SIMD (hardware threads) or some kind of programming constructs? I don't think you need a software scheduler for the first case, and there shouldn't be thousands of threads on a single machine for the second case. (you can probably use a cluster, but the scheduler probably won't be scheduling those thousands of threads directly)

synergy20 · on July 4, 2022

I mean SIMT(CUDA style), or SPMD, and also Tensorflow host|device runtime scheduling, it's a mystery to me how Nvidia etc schedule the AI workload(along with intrinsic) to achieve huge volume parallelism.

pca006132 · on July 4, 2022

For CUDA, iirc these tasks (kernel launches) are put into streams (similar to threads) and scheduled for execution by the driver. Within each kernel launch, each 32 threads, called a warp, are executed together in a single unit, skipping some instructions when they have different control flow. I think the driver perhaps schedule in a warp level, and these warps are executing similar things so can be scheduled together. I am not an expert in this so I am not sure if this is how they do it.

synergy20 · on July 4, 2022

That's pretty much how that works, and they sync threads inside warp, I have been groping in the dark for a while, and always hoped someone can write up some details to help me to get the idea straight.

Intel, AMD, Google(TPU) all have their own way to schedule 'kernel's which are very different from CUDA, there are no details about them, I was just curious like 'how do they work across CPU|GPU'?

Thanks for the reply.

maxgio92 · on July 4, 2022

Thank you all for this. Scheduling on GPUs is a topic in the dark for me, to be discovered

astrange · on July 5, 2022

GPU "threads" aren't exactly CPU threads, and GPU "cores" aren't really CPU cores. It's better to think of threads as SIMD instructions and cores as ALUs.

GPU execution order is typically either "immediate mode" or "tile mode", where tiles are more common on mobile GPUs, but Nvidia has also used them.

https://www.realworldtech.com/tile-based-rasterization-nvidi...

maxgio92 · on July 5, 2022

thank you for the details!

dragontamer · on July 4, 2022

> In AI workload these days, how to schedule thousands of parallel threads(SIMD style) becomes more and more interesting, wish someone had a good write on that topic.

CPU Shaders have a very simple scheduler. Blocks are scheduled one block at a time likely in some very simple heuristic (likely from lowest index to highest index, though nominally you don't know what order they're executed in).

That's good enough in 90% of cases. The last 10% can be served by a variety of techniques, including long-running kernels, data-structure queues, dynamic parallelism, etc. etc. But mastering the basic kernel invocation in the classic "matrix multiplication" scenario (roughly lining up with OpenMP "static scheduling"), and its benefits, is a good idea.

maxgio92 · on July 4, 2022

That's a good point! Thank you for your feedback!

snvzz · on July 4, 2022

For the state of the art in scheduling, and for highest performance in context switching, look into seL4[0].

Lots of cool papers to read.

0. https://sel4.systems/

maxgio92 · on July 4, 2022

Thank you, it seems very interesting. I leave here links also for other people. - https://docs.sel4.systems/projects/sel4/documentation.html

PaulDavisThe1st · on July 4, 2022

Good overview, but missing any discussion (that I could see) of how non-SCHED_OTHER scheduling classes (such as SCHED_FIFO and SCHED_RR) interact with "normal" scheduling. The rules described here (e.g. "the task that has run least runs next") do not apply to tasks in these scheduling classes.

nsm · on July 4, 2022

I've found "Challenges Using Linux as a Real-Time Operating System" by Michael M. Madden from NASA a really big help - https://ntrs.nasa.gov/api/citations/20200002390/downloads/20....

maxgio92 · on July 4, 2022

Thank you for this material!

gpderetta · on July 4, 2022

My understanding is that they don't. Realtime tasks have strict priority over non realtime tasks. As long as there are runnable rt task the kernel won't schedule any other task.

At least that's the theory. I think these days the kernel reserves (optionally?) 5% cpu to non rt tasks, enough to run a shell and kill a runaway rt process.

chupasaurus · on July 4, 2022

> I think these days the kernel reserves (optionally?) 5% cpu to non rt tasks

That is the default setting (documentation [0]).

[0] https://docs.kernel.org/scheduler/sched-rt-group.html

PaulDavisThe1st · on July 4, 2022

Correct on all counts. But there is also the difference between SCHED_FIFO and SCHED_RR to be covered.

maxgio92 · on July 4, 2022

Yes, you're right. I preferred to not cover too much topics on the same blog, but that's a good idea as other people said here. Thank you.

maxgio92 · on July 4, 2022

You're right. The journey is introductory on basic scheduling concepts and focuses on the SCHED_NORMAL. That rule as I written is valid for the CFS.

nrclark · on July 4, 2022

Great article, very informative. If you do a follow-up, I'd love to read about how the different SCHED modes interact with each other / default operation.

maxgio92 · on July 4, 2022

Thank you very much, I appreaciate it. That's an interesting topic too. As other people said, actually they don't interact. But I'd like to dig into it.

timvisee · on July 4, 2022

Great read! Thanks for all the links to Linux kernel source. That's super interesting to see.

maxgio92 · on July 4, 2022

Thank you :-)

sdgluck · on July 4, 2022

Just to let you know, the link to Twitter in the nav doesn't work!

maxgio92 · on July 4, 2022

Thank you, going to check

kosolam · on July 4, 2022

Awesomeness. Thank you for sharing. Saved to read later.

terrywang · on July 4, 2022

Make sure you read it ;-) I tried to finish reading it in 5 mins but fell asleep on bed, didn’t finish until now, will do that after getting up.

maxgio92 · on July 5, 2022

I’m happy to hear that :-)

maxgio92 · on July 5, 2022

Thank you!