Huh, I was always told that inheritance hurt performance as it requires addition...

andyferris · on May 8, 2025

As I understand it, back when Simula and LISP were invented it was generally the case that loads and stores took 1 cycle and there were no CPU caches. These pointer-chasing languages and techniques really weren't technically bad for the computers of the time - it's just that we have a larger relative penalty for randomly accessing our Random Access Memory these days so locallity is important (hence data-oriented design, ECS, etc).

I am kind of amused they _removed_ first-class functions though!

int_19h · on May 8, 2025

Function arguments weren't actually first-class to begin with. In Algol 60 (of which Simula started as a superset), you could pass functions as arguments to other functions, but that's it - it wasn't a proper type so you couldn't return it, shove it into a variable, have an array of functions etc. Basically, it had just enough restrictions that you would never get up in a situation where you could possibly call a function for which the corresponding activation frame (i.e. locals) could be gone. But when Simula added classes and objects, now you could suddenly capture arguments in a way that allows them to outlive the callee.

i_c_b · on May 8, 2025

I was in the game industry when we originally transitioned from C to C++, and here's my recollection of the conversations at the time, more or less.

In C++, inheritance of data is efficient because the memory layout of base class members stays the same in different derived classes, so fields don't cost any more to access.

And construction is (relatively fast, compared to alternatives) because setting a single vtable pointer is faster than filling in a bunch of variable fields.

And non-virtual functions were fast because, again, static memory layouts and access and inlining.

Virtual functions were a bit slower, but ultimately that just raised the larger question of when and where a codebase was using function pointers more broadly - virtual functions were just one way of corralling that issue.

And the fact that there were idiomatic ways to use classes in C++ without dynamically allocating memory was crucial to selling game developers on the idea, too.

So at least from my time when this was happening, the general sense was that, of all the ways OO could be implemented, C++ style OO seemed to be by far the most performant, for the concerns of game developers in the late 90's / early 2000's.

I've been out of the industry for a while, so I haven't followed the subsequent conversations since too closely. But I do think, even when I was there, the actual reality of OO class hierarchies were starting to rear their ugly heads. Giant base classes are indeed drastically bad for caches, for example, because they do tend to produce giant, bloated data structures. And deep class hierarchies turn out to be highly sub-optimal, in a lot of cases, for information hiding and evolving code bases (especially for game code, which was one of my specialties). As a practical matter, as you evolve code, you don't get the benefits of information hiding that were advertised on the tin (hence the current boosting of composition over inheritance). I think you can better, smart discussions about those issues in this thread, so I won't cover them.

But that was a snapshot of those early experiences - the specific ways C++ implemented inheritance for performance reasons were definitely, originally, much of the draw to game programmers.

kragen · on May 8, 2025

No, inheritance does not require additional address lookups. Single inheritance as discussed here doesn't even require additional address arithmetic; the address of the subclass instance is the same as the address of the superclass instance.

Yes, current GCs are very fast and do not suffer from the problems Simula's GC suffered from. Nevertheless, they do still have an easier time when you embed record A as a field of record B (roughly what inheritance achieves in this case) rather than putting a pointer to record A in record B. Allocation may not be any faster, because in either case the compiler can bump the nursery pointer just once (with a copying collector). Deallocation is maybe slightly faster, because with a copying collector, deallocation cost is sort of proportional to how much space you allocate, and the total size of record B is smaller with record A embedded in it than the total size of record A plus record B with a pointer linking them. (That's one pointer bigger.) But tracing gets much faster when there are no pointers to trace.

You will also notice from this example that it's failing to embed the superclass (or whatever) that requires an additional record lookup. And probably a cache miss, too.

I think the reason many game engines are moving away from inheritance is that they're moving away from OO in general, and more generally the Lisp model of memory as a directed graph of objects linked by pointers, because although inheritance reduces the number of cache misses in OO code, it doesn't reduce them enough.

I've written about this at greater length in http://canonical.org/~kragen/memory-models/, but I never really finished that essay.

josephg · on May 8, 2025

> No, inheritance does not require additional address lookups. Single inheritance as discussed here doesn't even require additional address arithmetic; the address of the subclass instance is the same as the address of the superclass instance.

Yes it does! Inheritance itself is fine, but inheritance almost always means virtual functions - which can have a significant performance cost because of vtable lookups. Using virtual functions also prevents inlining - which can have a big performance cost in critical code.

> Nevertheless, they do still have an easier time when you embed record A as a field of record B (roughly what inheritance achieves in this case) rather than putting a pointer to record A in record B.

Huh? No - if you put A and B in separate allocations, you get worse performance. Both because of pointer chasing (which matters a great deal for performance). And also because you're putting more pressure on the allocator / garbage collector. The best way to combine A and B is via simple composition:

    struct C { a: A, b: B }

In this case, there's a single allocation. (At least in languages with value types - like C, C++, C#, Rust, Swift, Zig, etc). In C++, the bytes in memory are actually identical to the case where B inherits from A. But you don't get any class entanglement, or any of the bugs that come along with that.

> I think the reason many game engines are moving away from inheritance is that they're moving away from OO in general

Games are moving away from OO because C++ style OO is a fundamentally bad way to structure software. Even if it wasn't, struct-of-arrays usually performs better than arrays-of-structs because of how caching works. And modern ECS (entity component systems) can take good advantage of SoA style memory layouts.

The performance gap between CPU cache and memory speed has been steadily growing over the last few decades. This means, relatively speaking, pointers are getting slower and big arrays are getting faster on modern computers.

kragen · on May 8, 2025

I agree with what you say about ECS and the memory hierarchy. But not much else.

> inheritance almost always means virtual functions

Inheritance and "virtual functions" (dynamic method dispatch) are almost, but not completely, unrelated. You can easily have either one without the other. Golang and Lua have dynamic method dispatch without inheritance; C++ bends over backwards so that you can use all the inheritance you want without incurring any of the costs of dynamic method dispatch, as long as you don't declare anything virtual. This is actually a practical thing to do with modern C++ with templates and type inference.

> No - if you put A and B in separate allocations, you get worse performance

Yes, that's what I was saying.

> you're putting more pressure on the allocator / garbage collector

Yes, I explained how that happens in greater detail in the comment you were replying to.

With your struct C, it's somewhat difficult to solve the problem catern was saying Simula invented inheritance to solve; if A is "list node" and B is "truck", when you navigate to a list node p of type A*, to get the truck, you have to do something like &((struct C *)p)->b, relying on the fact that the struct's first field address is the same as the struct's address and on the fact that the A is the first field. While this is certainly a workable thing to do, I don't think we can recommend it without reservation on the basis that "you don't get any class entanglement, or any of the bugs"! It's very error-prone.

> Games are moving away from OO because C++ style OO

There are a lot of things to criticize about C++, but I think one of its worst effects is that it has tricked people into thinking that C++ is OO. "C++ style OO" is a contradiction in terms. I mean, it's possible to do OO in C++, but the language fights you viciously every step of the way; the moment you make a concession to C++ style, OO collapses.

bitwize · on May 8, 2025

Simple inheritance makes the class hierarchy complicated through issues like the diamond inheritance problem, which C++ resolves in typical C++ fashion: attempt to satisfy everybody, actually satisfy nobody.

The designers of StarCraft ran into the pitfalls of designing a sensible inheritance hierarchy, as described here (C-f "Game engine architecture"): https://www.codeofhonor.com/blog/tough-times-on-the-road-to-...

kragen · on May 8, 2025

Simple inheritance doesn't have the diamond problem, because that requires multiple inheritance, which isn't simple. Smalltalk doesn't have multiple inheritance; I don't think SIMULA did either.

int_19h · on May 8, 2025

Simula is strictly single inheritance (and no interfaces).

kragen · on May 8, 2025

Thank you!

nine_k · on May 8, 2025

The best implementation inheritance hierarchy is none :)

If you must, you can use the implementation inheritance for mix-ins / cross-cutting concerns that are the same for all parties involved, e.g. access control. But even that may be better done with composition, especially when you have an injection framework that wires up certain constructor parameters for you.

Where inheritance (extension) properly belongs is the definition of interfaces.

virtue3 · on May 8, 2025

really amazing read thank you.

wbl · on May 8, 2025

Dynamic invocation, not strict inheritance is the issue here. Simply getting functions and fields from a superclass costs nothing if at each callsite the compiler knows enough to say where it is from.

nine_k · on May 8, 2025

But this may only happen when no virtual / overridden methods are involved, no VMT to look up in, no polymorphism at play. This is tanamount to composition, which should be preferred over inheritance anyway.

In this regard, Go and Rust do classes / objects right, Java provides the classical pitfalls, and C++ is the territory where unspeakable horrors can be freely implemented, as usual.

wbl · on May 8, 2025

Overriding is fine. The issue comes with polymorphism and would even without inheritance per se, as can be seen in Go where interfaces provide polymorphism without inheritance.

nine_k · on May 9, 2025

You're right, the issue comes from polymorphism, and from use of polymorphic values when the specific class cannot be determined statically. But this is sort of the point of polymorphism, can't do much about it, except maybe caching the looked-up value(s) when looping over a polymorphic collection.

Overriding may lead to other troubles though [1].

[1]: https://www.snopes.com/fact-check/shoot-me-kangaroo-down-spo...

vlovich123 · on May 8, 2025

Parent is correct - if the compiler has the information to devirtualize it becomes direct dispatch regardless of the mechanisms involved at the source level. This is also typically true for JITs.

nine_k · on May 8, 2025

This is true as long as the compiler actually can de-virtualize the method, that is, that it can prove that the objects it's handling at a particular call site do not override it. Quite often this is not the case, because of the pattern when the programmer is expected to override / implement an abstract method to plug in the desired functionality. The lookup can be fast if there are few classes involved, and their vtables get cached.

JITs can do many fascinating optimizations based on profiling the actual code. They must always be on guard though for a case when their profiling-based conclusions fail to hold with some new data, and they have to de-optimize. This being on guard also has its cost.

vlovich123 · on May 8, 2025

Even if you call a pure virtual function of an abstract base class, the compiler can devirtualize if it can determine the type anyway due to data flow analysis. There are limitations but with LTO you’d be surprised at how effective the compiler is here. Also in Rust you’re never going through the vtable unless it’s a &dyn Trait and the compiler fails to devirtualize.

Reason077 · on May 8, 2025

Nah. Classic C++/Java style inheritance with vtable dispatch is very fast. Generally no slower than a C-style function call, and actually sometimes faster depending on how the C code is linked, characteristics of the CPU, etc.

nine_k · on May 8, 2025

This assumes that the vtables stay in at least L2 cache, which may be a correct assumption for the few hot-path classes. In this regard, I remember how Facebook's android app once failed to build when the codebase exceeded the limit of 64k classes.

xxs · on May 8, 2025

No, Java does Class hierarchy analysis and has multiple way not to use v-table calls.

Single site (no class found overriding a method) are static and can be inlined directly. Dual call sites use a class check (which is a simple equality), can be inlined, no v-table. 3-5 call sites use inline caches (e.g. the compiler records what class have been used) that are similar and some can be inlined, usually plus a guard check.

Only high polymorphic calls use v-table and in practice is a very rare occasion, even with Java totally embracing inheritance (or polymorphic interfaces)

Note: CHA is dynamic and happens at runtime, depending which classes have been loaded. Loading new classes causes CHA to be performed again and if there are affected sites, the latter are to be deoptimized (and re-JIT again)

josephg · on May 8, 2025

Thats very cool.

Does C++ have any of these optimisations?

gpderetta · on May 8, 2025

Some compilers do some of these optimizations, at least those that not require a JIT and code patching at runtime.

But these optimizations are vastly less critical in C++ where only are minority of functions are virtual.

nine_k · on May 8, 2025

Thanks, TIL!

xxs · on May 8, 2025

if the dispatches do use vtable they won't be inline and won't be faster. The real deal is inlining when necessary, which inheritance doesn't really prevent.

jayd16 · on May 8, 2025

What game engines are moving away from inheritance? Composition is preferred, tho. It's just easier to refactor things that way.