Github Copilot has, by their own admission, been trained on mountains of GPL code, so i'm unclear on how it's not a form of laundering open source code into commercial works. The handwave of "it usually doesn't reproduce exact chunks" is not very satisfying.
Copyright does not only cover copying and pasting; it covers derivative works. Github Copilot was trained on open source code and the sum total of everything it knows was drawn from that code. There is no possible interpretation of "derivative" that does not include this.
I'm really tired of the tech industry treating neural networks like magic black boxes that spit out something completely novel, and taking free software for granted while paying out $150k salaries for writing ad delivery systems. The two have finally fused and it sucks.
Previous "AI" generation has been trained on public text and photos, which are harder to make copyright claims on, but this is drawn from large bodies of work with very explicit court-tested licenses, so I look forward to the inevitable massive class action suits over this.
"But eevee, humans also learn by reading open source code, so isn't that the same thing"
- Humans are capable of abstract understanding and have a breadth of other knowledge to draw from
- Statistical models do not
- You have fallen for marketing
Even MIT code still requires attribution, and they don't even know who to attribute the output to.
Microsoft's "Github" product now does auto-complete madlibs to write code for you. It's a marketing stunt. It's a gag. But it's also a massive license-violation framework.