> the idea you can slap an applicance (thats sometimes its own LLM) onto another LLM and pray that this will prevent errors to be lunacy
It usually works though. There are no guarantees of course, but sanity checking an LLMs output with another instance of itself usually does work because LLMs usually aren't reliably wrong in the same way. For instance if you ask it something it doesn't know and it hallucinates a plausible answer, another instance of the same LLM is unlikely to hallucinate the same exact answer, it'll probably give you another answer, which is your heads up that probably both are wrong.
Sure, and then you can throw another LLM in and make them come to a consensus, of course that could be wrong too so have another three do the same and then compare, and then…
I have an ongoing and endless debate with a PhD that insists consensus of multiple LLMs is a valid proof check. The guy is a neuroscientist, not at all a developer tech head, and is just stubborn, continually projecting a sentient being perspective on his LLM usage.
This, but unironically. It's not much different from the way human unreliability is accounted for. Add more until you're satisfied a suitable ratio of mistakes will be caught.
It usually works though. There are no guarantees of course, but sanity checking an LLMs output with another instance of itself usually does work because LLMs usually aren't reliably wrong in the same way. For instance if you ask it something it doesn't know and it hallucinates a plausible answer, another instance of the same LLM is unlikely to hallucinate the same exact answer, it'll probably give you another answer, which is your heads up that probably both are wrong.