As far as the actual internals are concerned, yes. But as a model it's not wrong. It's like epicycles aren't wrong as a model either. GP's mental model of Git is mine as well, and it works very well for me.
As a model it's wrong, and useless, and possibly harmful.
Because the "states are captured as diffs" model does not tell you anything true that's useful. And at worst it gives you incorrect ideas of the performance characteristics of the system e.g. that small changes are bad, because lots of changes means lots of diffs to apply which is slow.
By forgetting that model, you're strictly less wrong, and helped no less about reasoning about git.
> It's like epicycles aren't wrong as a model either.
But they were. "Adding epicycles" has literally become a saying about trying to pile on more crap to try and (fail to) make an incorrect model work.
And epicycles were at least trying to keep a simple model working, the "snapshot model" is more complicated than git's actual behaviour to start with.
> As a model it's wrong, and useless, and possibly harmful.
It's no more wrong than epicycles. It's not useless either: it works very well for me. For users (as opposed to Git developers), it's hard to see how it could be harmful.
The harmful part comes from assumptions based on it.
Systems like CVS or Subversion indeed store diffs of file conceptually and storage wise. This has notable co sequences: when doing operations on a large span of revisions all takes long as a bunch of diffs have to be applied in sequence. This then leads to reluctance of small commits for the wrong reasons.
Wrong assumptions about the workings leading to wrong decisions is harmful.
> This has notable co sequences: when doing operations on a large span of revisions all takes long as a bunch of diffs have to be applied in sequence. This then leads to reluctance of small commits for the wrong reasons.
This happens in Git as well though. If you have to rebase your commits across a few thousand upstream commits, you'll be in for some pain. (For example, I've a commit to PostgreSQL to add ALWAYS DEFERRED constraints, and I've had to rebase it across thousands of upstream commits because I've lacked the time to see that contribution across the finish line.)
You seem to be indirectly arguing for merge-based workflows. (At least that's what I'm reading between the lines. Perhaps my reading is wrong, but I'll run with it until you tell me otherwise.) However, a) merging is not always accepted by upstream, and b) merge-based workflows suck.
I grant that merging is very simple with Git's internals: Git does the three-way merge thing, you manually merge conflicts, you create a commit for the conflict-resolved checked out state, recording the two parents in that commit, and you're done. This works because Git simply stores the objects as they are, internally using deltas -not quite diffs- to compress the object store. This is very simple, yes, but it yields a workflow that doesn't work for many projects. Rebasing is much better, and it always works, especially with [0].
So now you might wonder how to deal with the rebase-across-thousands-of-commits problem. Well, I've a script[0] (not originally mine, but I've hacked on it) that does something like bisection to find upstream commits causing conflicts so as to greatly speed up the process.
If you got to rebase over many decisions you are likely in trouble anyways. There small commits can be helpful to ease dealing with merge conflicts.
However rebase is a single special operation. The diff-based storage affects all. From updating a local repository from upstream, diffing between some distance, ...
As far as the actual internals are concerned, yes. But as a model it's not wrong. It's like epicycles aren't wrong as a model either. GP's mental model of Git is mine as well, and it works very well for me.