This sounds like a cheeky joke project, but assuming it's not, it got me thinkin...

lubujackson · 2026-04-26T18:04:56 1777226696

There is a ton of optimization possible when we are able to observe how LLMs and agents process and navigate our code given different prompts. For example, our MCP was pulling down way too much data to resolve a simple "count rows" request. Once you see it, it's easy to resolve but I don't know of a good framework yet for walking through some of these patterns.

I built an eval framework to look just at tool calls given a static prompt, with the idea that LLMs should be able to deduce the best tool calls and arguments needed to get requested data. Not as great as full observability, but helpful for complex tool interactions. Anyone have any good tools for this problem?

In the same way we mentally walk through deterministic logic, SWEs need to learn to anticipate LLM context and tool awareness, which is much trickier to reason through, especially given the various LLM IDEs and how they manage context as a black box.

mptest · 2026-04-26T12:23:12 1777206192

If you read anthropic paper on "functional" emotions in llm's you'd have a lot of fun. there's so much research that would be so fun to do if we had the compute to spare

https://transformer-circuits.pub/2026/emotions/index.html

ursillycomment · 2026-04-26T14:01:51 1777212111

This is not something you need to worry about because you are naively anthropomorphizing a next-best token guessing algorithm.

Respectfully, the reason you think “AIs suffer” is because of a shortcoming in your understanding of what an LLM actually is.

This scenario is no different than considering if a shovel gets tired after using it all day to dig holes in the ground.