This gets rid of temporary arrays, but this still isn't optimal if z is large. Memory locality means it's faster to apply a scalar operation like z2+c in a single pass, rather than in two separate passes.
Explicitly unrolling loopy code (e.g., in pypy or Numba) is one easy way to achieve this, but you have to write more code.
Gotcha. I feel like I remember some numpy or scipy way of creating complex ufunc ops and applying them simultaneously, but maybe I'm misremembering or thinking np.vectorize was fast?