> Memory is a difficult one, especially in garbage-collected languages which have a habit of filling up the heap even when it's not used, so it's not always obvious how much memory is actually being used without having language/runtime specific signals.
I'm not sure what do you mean here, Cloud Run uses Docker which runs regular processes in a cgroup, so it's sufficient to check the cgroup memory usage, right? Yes, Java can always use large heaps but we're running Python and C++ where a process' memory usage directly relates to what a program allocates (even PyPy with GC has this property).
> The mitigation in Cloud Run is both concurrency, and that you're only billed while a request is active.
When there are memory peaks, larger deployments without container-level concurreny look better. For my example 16GB of RAM allows running 8 containers to get a chance for a 2GB task to complete, but on average 90% of the memory will be wasted. On a single 16GB server I can run 48 tasks with 40% wasted and a high chance of the 2GB tasks finishing. Yes, in this scenario I must handle tasks killed due to OOM but the difference in throughput is so large that it's worth it.
Cloud Run is built upon Knative so it runs inside k8s (which uses CRI API of a backend, Docker or any other alternative) which handles OOM by inner resource manager, not cgroups.
I'm not sure what do you mean here, Cloud Run uses Docker which runs regular processes in a cgroup, so it's sufficient to check the cgroup memory usage, right? Yes, Java can always use large heaps but we're running Python and C++ where a process' memory usage directly relates to what a program allocates (even PyPy with GC has this property).
> The mitigation in Cloud Run is both concurrency, and that you're only billed while a request is active.
When there are memory peaks, larger deployments without container-level concurreny look better. For my example 16GB of RAM allows running 8 containers to get a chance for a 2GB task to complete, but on average 90% of the memory will be wasted. On a single 16GB server I can run 48 tasks with 40% wasted and a high chance of the 2GB tasks finishing. Yes, in this scenario I must handle tasks killed due to OOM but the difference in throughput is so large that it's worth it.