I think that is an interesting observation and I generally agree. Your point abo...

Implicated · 2025-09-29T19:17:00 1759173420

> The thing is my "experiment" is one that represents a fairly common use case

Valid as well. I guess I'm just nitpicking based on how much I see people saying these models aren't useful combined with seeing this example, triggered my "you're doing it wrong" mode :D

> GPT-5-Codex allows me to write a pretty quick & dirty prompt, yet still get VERY good results.

I have a reputation with family and co-workers of being quite verbose - this might be why I prefer Claude (though haven't tried Codex in the last month or so). I'm typically setting up context and spending a few minutes writing an initial prompt and iterating/adjusting on the approach in planning mode so that I _can_ just walk away (or tab out) and let it do it's thing knowing that I've already reviewed it's approach and have a reasonable amount of confidence that it's taking an approach that seems logical.

I should start playing with codex again on some new projects I have in mind where I have an initial planning document with my notes on what I want it to do but nothing super specific - just to see what it can "one shot".

stingraycharles · 2025-09-30T00:05:52 1759190752

Yeah, as someone who has been using Claude Code for about 4 months now, I’ve adopted a “be super specific by default”-workflow. It works very well.

I typically use zen-mcp-server’s planning mode to scope out these tasks, refine and iterate on a plan, clear context, and then trigger the implementation.

There’s no way I would have considered “implement fuzzy search” a small feature request. I’m also paranoid about introducing technical debt / crappy code, as in my experience is the #1 reason that LLMs typically work well for new projects but start to degrade after a while: there’s just a lot of spaghetti and debt built up over time.

Aeolun · 2025-09-29T22:24:32 1759184672

I tend to tell claude to research what is already there, and think hard, and that gives me much better per-prompt results.

But you are right that codex does that all by default. I just get frustrated when I ask it something simple and it spends half an hour researching code first.

j_bum · 2025-09-29T22:30:00 1759185000

This makes me think that for simple things, we need to anti-prompt — tell the model to not overthink things.

wahnfrieden · 2025-09-30T00:55:02 1759193702

Some do this by using tools like RepoPrompt to read entire files into GPT-5 Pro, and then using GPT-5 Pro to send the relevant context and work plan to Codex so that it can skip needing to poke around files. If you give it the context, it won't spend that time looking for it. But then you spend time with Pro (which can ingest entire files at once instead of searching through them, and provide a better plan for Codex, though)

andai · 2025-09-30T08:40:11 1759221611

It worked on the first try, but did it work on the second?

I noticed in conversations with LLMs, much of what they come up with is non-deterministic. You regenerate the message and it disappears.

That appears to be the basic operating principe of the current paradigm. And agentic programming repeats this dice roll, dozens or hundreds of times.

I don't know enough about statistics to say if that makes it better (converging on the averages?) or worse (context pollution, hallucinating, focusing on noise?), but it seems worth considering.

user_7832 · 2025-09-30T03:59:33 1759204773

I would think that to truly rank such things, you should run a few tests and look for a clear pattern. It's possible that something promoted claude to take "the easy way" while chatgpt didn't.

hackernewds · 2025-09-30T15:39:28 1759246768

Your anecdata is not an "experiment" especially to derive such a broad conclusion :)