This can be dangerous, because Claude doesn't *truly* understand why it did some... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		thegeomaster 8 months ago \| parent \| context \| favorite \| on: AI promised efficiency. Instead, it's making us wo... This can be dangerous, because Claude doesn't truly understand why it did something. Whatever it writes a post-hoc justification which may or may not be accurate to the "intent". This is because these are still autoregressive models --- they have only the context to go on, not prior intent.

zahlman 8 months ago [–]

Indeed. Watching it (well, Anthropic, really) cheat at Baba Is You and then try to give a rationalization for how it came up with the solution (qv. https://news.ycombinator.com/item?id=44473615) is quite instructive.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact