Wouldn't that just accelerate collapse? How much do you trust the outputs of the...

rahen · 2026-05-26T17:05:45 1779815145

I was thinking of curated replay buffers, which would act like "dreams". To prevent collapse, the offline dataset would mix the new mid-term data with a baseline of anchor data (the original training distribution) so the model doesn't drift.

Also, we wouldn't train on the whole session. A separate critic module, like a reward model, would filter the KV cache to extract the high-value information, like a garbage collector before the LoRA.

That's just an idea though. Right now most research focuses on changing the architecture itself (TITAN, HOPE...) instead.

jack_pp · 2026-05-26T17:03:00 1779814980

We can trust the feedback we give it based on the output it provides.

ambicapter · 2026-05-26T17:16:37 1779815797

What kind of feedback are you giving? What's the reward function?

jack_pp · 2026-05-26T19:05:52 1779822352

Right now, no feedback since I don't run this system but our workflows could change to accommodate it