You are not wrong. For some "highly dynamic" applications, say, optimizing compiler IR where there are many different subclasses of IR nodes, dynamic dispatch is nice. But when you are running an ML model, scientific application where you already know which sparse matrix format you need, etc, you can do all of that statically with less overhead and performance predictability. This is in no way an argument against Julia; the point is that you don't need dynamic dispatch if you can statically determine what needs to happen.
That is basically the core insight behind julia. The really performance sensitive parts of your application are already static just by the nature of that code, so we can extract that static information to make it really fast. We can also make use of the same static information for static error messages or static compilation and get the best of both world (dynamic during development, static when you're done), but the tooling for that is a bit less developed at the moment.
Julia can and does make 'provably correct' decisions at compile time, it's just that at the the default typesystem settings are not quite correct for machine learning apps.
Also it's not a high overhead runtime. The runtime itself is compiled to highly optimized machine code (it can even compile, say the derivative of f(x) = 5x+3 down to the machine immediate "5" at compile time).
There is a lot of lifting to get that compilation framework into place, so there is a load-time overhead.