It's only a copyright issue if I know what the original source code looks like. If I don't know what it looks like, and my autocomplete writes it, how could I possibly know it's stolen?
I think the line b/w derivative work and new work might be different.
I mean if llms are trained on it ... and a lot of other things and then LLM can output the source code from a input ... then wouldn't it be open source / public domain
I don't think that's true. When chatgpt generates something that infringes (even on something not in the training data) it is still infringement and the output cannot be used by the user for anything they couldn't use the original for.
But that's the point he tries to make. When you "teach" LLM with some knowledge, you teach it a set of patterns. It won't necessarily drop the code that infringes copyright. Say you load Gumroad code into Gemini Pro context and say something like: "Check this app. Analyze the implementation of feature XY... I need you to help me implement feature XY... but in Go". Then, you can recreate an entire platform that will look nothing like the original but will have the same features and open source it.