These can be in fast path, depends on the code, but that's not the point.
Coming from embedded development, the amount of short-lived heap allocations that are so conveniently swiped under the carpet with C++ is ridiculous. Even with something like slab allocators and extremely cheap allocs, there are always cases when alloc-free cycles fall on the edge of slab and become very expensive. The less is allocated during the run-time the better.
When these are combined across the whole codebase, optimizing them away _does_ result in a noticeable speed increase and makes the whole thing behave more predictably. Regardless of their perceived per-call cost.