Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Wouldn't that just accelerate collapse? How much do you trust the outputs of the llm to provide trustworthy and valuable new information? I mean I understand distillation works. But that's much more structured and thoughtful than my sessions at least.
 help



I was thinking of curated replay buffers, which would act like "dreams". To prevent collapse, the offline dataset would mix the new mid-term data with a baseline of anchor data (the original training distribution) so the model doesn't drift.

Also, we wouldn't train on the whole session. A separate critic module, like a reward model, would filter the KV cache to extract the high-value information, like a garbage collector before the LoRA.

That's just an idea though. Right now most research focuses on changing the architecture itself (TITAN, HOPE...) instead.


We can trust the feedback we give it based on the output it provides.

What kind of feedback are you giving? What's the reward function?

Right now, no feedback since I don't run this system but our workflows could change to accommodate it



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: