This sounds like a cheeky joke project, but assuming it's not, it got me thinking: I wonder if coding AI can be effectively and reliably prompted into minimizing its own anguish. Like, "don't write code that is going to make you (or I) suffer." And along those lines, do we know if the things that make AIs suffer are the same things that make human developers suffer? Perhaps the least-agonizing code for an LLM to ingest looks radically different and more/less verbose than what we human developers would see as clean, beautiful code...
There is a ton of optimization possible when we are able to observe how LLMs and agents process and navigate our code given different prompts. For example, our MCP was pulling down way too much data to resolve a simple "count rows" request. Once you see it, it's easy to resolve but I don't know of a good framework yet for walking through some of these patterns.
I built an eval framework to look just at tool calls given a static prompt, with the idea that LLMs should be able to deduce the best tool calls and arguments needed to get requested data. Not as great as full observability, but helpful for complex tool interactions. Anyone have any good tools for this problem?
In the same way we mentally walk through deterministic logic, SWEs need to learn to anticipate LLM context and tool awareness, which is much trickier to reason through, especially given the various LLM IDEs and how they manage context as a black box.
If you read anthropic paper on "functional" emotions in llm's you'd have a lot of fun. there's so much research that would be so fun to do if we had the compute to spare