While Meta cites safety concerns for restricting access to its training data, critics see a simpler motive: minimizing its legal liability and safeguarding its competitive advantage. Many AI models are almost certainly trained on copyrighted material; in April, The New York Times reported that Meta internally acknowledged there was copyrighted content in its training data “because we have no way of not collecting that.” There’s a litany of lawsuits against Meta, OpenAI, Perplexity, Anthropic, and others for alleged infringement. But with rare exceptions — like Stable Diffusion, which reveals its training data — plaintiffs must currently rely on circumstantial evidence to demonstrate that their work has been scraped.
You are viewing a single comment's thread from: