I haven't used local models. I don't have the 60+gb of vram to do so.
I've tested aider with gemini2.5 with prompts as basic as 'write a ts file with pupeteer to load this url, click on button identified by x, fill in input y, loop over these urls' and it performed remarkably well.
Llm performance is 100% dependent on the model you're using so you ca hardly generalize from a small model you run locally on a cpu.
Local models just aren't there yet in terms of being able to host locally on your laptop without extra hardware.
We're hoping that one of the big labs will distill an ~8B to ~32B parameter model that performs SOTA benchmarks! This would be huge both in cost and probably make it reasonable for most people to code with agents in parallel.
I've tested aider with gemini2.5 with prompts as basic as 'write a ts file with pupeteer to load this url, click on button identified by x, fill in input y, loop over these urls' and it performed remarkably well.
Llm performance is 100% dependent on the model you're using so you ca hardly generalize from a small model you run locally on a cpu.