Hey, pabs3! Actually this is not using a rolling checksum for detection but rather a combo of language model, checksums, automatons, bitvectors, inverted indexes and multiple sequences alignment (e.g. a specialized diff). I put some docs there to explain the approach at ahttps://github.com/nexB/scancode-toolkit/blob/develop/src/li...
You need a bit more than checksums for this. If anything the FSF published many different versions of the "official" GPL2 texts and this would defeat checksums. See https://github.com/pombredanne/gpl-history ... so in the general cases hashing a text does not work consistently and safely.
Eventually you need a diff for a pairwise comparison, and the only difficulty is making diff fast or find ways to approximate it fast enough to avoid doing a full diff.