I'm pretty sure Rust people invented a new meaning for the word "stackless": in Rust's context it means "does not depend on an underlying architectural stack." It does not mean "does not use the concept of a stack." That would be why you can have stack traces of async functions in Rust.
In practice it means allocating a stack for your function of the maximum size you would need.
Rust has stackless coroutines. Go has stackful coroutines. The literal distinction is that stackless coroutines exist independently of a stack: you have a state machine object that, when it's resumed into, is put onto the stack and calls normal functions through the run-time stack. Stackful coroutines are essentially lightweight threads (hence "green threads" or "virtual threads"): the coroutine comes with a miniature run-time stack.
In practice, stackless coroutines are more space efficient and have lower context-switch costs. A big factor for Rust specifically is interoperating with C. Stackful coroutines make writing "straight-line" code simpler (no function coloring) but introduce more overhead.
async is just a syntactic sugar to transform a function into the state machine you would have otherwise written. If you don't trust the compiler to perform this transform adequately, you can write a Future by hand. There are users using async/await syntax to write software for microcontrollers without dynamic memory allocation, runtimes like tokio are not a hard dependency of async/await syntax or the Future abstraction.
C++ coroutines perform the same transform, but they dynamically allocate that state machine for every coroutine (unless the optimizer can eliminate the allocation). In contrast, Rust inlines every coroutine into a single state machine until you explicitly tell it to allocate it. I don't know why C++ chose a less "zero cost" solution; I suspect the fact that they don't have memory safety and so letting you own a coroutine's state is very footgunny in C++ played a part in their choice.
In practice it means allocating a stack for your function of the maximum size you would need.