> Inlining is actually non-trivial. OTOH, JIT runtimes have more input data than...

tachyonbeam · on March 4, 2021

> In C# you can make data structures which [...]

Yeah but that's completely validating my point. C# is not Python or JS. It's a (remote) cousin of C which tries to take some of the valuable performance tools from C and bring those to a managed runtime. Because it's strongly typed, it's a lot easier for the compiler to optimize, and because you have all these tools to design compact objects without pointer, you can do that job so the compiler doesn't have to.

And again, an experienced C# programmer can probably write code that runs circles performance-wise around code written by an experienced JS/Python developer in most cases.

Const-me · on March 4, 2021

> It's a (remote) cousin of C which tries to take some of the valuable performance tools from C and bring those to a managed runtime.

That’s correct. But at the same time, the language is way higher level than C or C++.

> experienced C# programmer can probably write code that runs circles performance-wise around code written by an experienced JS/Python developer in most cases.

In some cases, an experienced C# programmer can even write code which approaches or outperforms C. My Linux video player library https://github.com/Const-me/Vrmac/tree/master/VrmacVideo#per... uses CPU on par with VLC, and 20% less RAM.

pkolaczk · on March 4, 2021

> JIT runtimes have more input data than a C compiler

They have more data, but they are at a disadvantage by being time-pressured. They can't apply costly analysis to these data because they need to compile fast. Therefore typically they limit themselves to local analysis which may miss a lot of opportunity for inlining.

Const-me · on March 4, 2021

> They can't apply costly analysis to these data because they need to compile fast.

True in general, but they use quite a few tricks to minimize the consequences.

They use interpreter, or very unoptimal but fast version of JIT compiler, first time a function is called. They replace it with faster version once it’s clear the function is called a lot.

Unlike C compilers, they don’t need to do that analysis globally for the whole program. They only need to do that for the hot paths, that’s often a small portion of the code.

They can offload that analysis and code generation to another CPU core, and replace the implementation once that background thread has produced a faster version.

pkolaczk · on March 4, 2021

> They only need to do that for the hot paths, that’s often a small portion of the code.

That's often correct, however unfortunately codebases today can be very, very huge. It can take a really lot of effort to optimize even just 10% of the hottest code if the product is several hundreds of MB of compressed byte-code. There are also applications with no obvious hot-spots, but flat profiles - e.g. database systems, where most of the time is being spent transferring data between various layers of the system. If a request gets passed through most of the layers, and can be routed into different areas depending on the query type set at runtime, and the clients are allowed to send queries of various types, so they target various different submodules of the server, there will be no hotspots. In these cases warmup can take enormous amount of time.

Even for a server this can be a problem, because after restarting you get an immediate performance hit.

Also keep in mind many software products are not long-living backend processes that can warmup for hours or even minutes. Client apps need good responsiveness, and before JIT even realizes which code to compile, it is already too late.

Const-me · on March 4, 2021

I think what you wrote largely applies to Java and especially JavaScript, much less to C#. Value types, real generics, and native stack allow even the faster version of the .NET JIT to produce native code that’s not too horrible performance wise.

Good enough for desktop or embedded use cases, even on slow CPUs. I have 3 such devices on my desk, Raspberry Pi 4, a dev.board with Rockchip RK3288, and a tablet with Atom Z3735G, .NET is reasonably fast on all of them, without noticeable warmup issues at startup.

pkolaczk · on March 4, 2021

I was talking about JITs in general. Sure, you can AOT-compile .NET code quite efficiently, but then this is a different story.

Here is a nice analysis of how various JITs warmup in practice:

https://tratt.net/laurie/blog/entries/why_arent_more_users_m...

TL;DR; often they don't!

andrekandre · on March 4, 2021

> unfortunately codebases today can be very, very huge

why is that?

imtringued · on March 4, 2021

Actually, action for action managed languages tend to be "faster" than C in the sense that writing the equivalent program would be slower in C. However, the additional freedom leads to increased program complexity and that complexity eats into the performance budget enough to end up slower than "idiomatically" written C where the developer distilled the solution to its bare essence.

Javascript has the fastest general purpose hashmap/dictionary implementation of all programming languages but at the same time you are forced to use them all the time which overall slows the language down. Writing exactly the same code in C would be even slower since C hashmaps aren't as optimized. However, C code is rarely written like that so it's usually faster.

jonathanstrange · on March 4, 2021

I don't know about LLVM but GCC has had something like this for years as a standard optimization feature. You create a binary with special profiling code which writes to a file. After running the program a few times, you recompile with optimization that uses this file. I forgot the flags though.

Personally, I'm more interested in executable optimization. It decompiles an executable, performs whole-program optimization, and re-compiles it. I'd love to tinker around with optimizing other people's binaries (e.g. games) for my machine. There is something like that for LLVM but it's very experimental.

Const-me · on March 4, 2021

It's called profile-guided optimization.

And it fails for C when the source data changes significantly, compared to the inputs used by developers when they were building the binary.

jonathanstrange · on March 5, 2021

Yes indeed.