'Impossible' to train AI without copyrighted content says OpenAI

ChatGPT developer OpenAI has told the UK parliament that it is impossible to train its generative artificial intelligence (GenAI) services without access to copyrighted work.

The company, along with backer Microsoft, is facing a lawsuit from the New York Times, which has accused the AI tech company of “unlawful use” of its work to create its products.

Now in a submission to the House of Lords’ communications and digital select committee, the company appears to be angling for a relaxation of copyright laws.

The submission, first reported by the Telegraph, states: “Because copyright today covers virtually every sort of human expression – including blog posts, photographs, forum posts, scraps of software code, and government documents – it would be impossible to train today’s leading AI models without using copyrighted materials.

“Limiting training data to public domain books and drawings created more than a century ago might yield an interesting experiment, but would not provide AI systems that meet the needs of today’s citizens.”

In a separate blog post published to its website on Monday, OpenAI responded to the lawsuit, saying: “We support journalism, partner with news organisations, and believe the New York Times lawsuit is without merit.”

In addition to the NYT suit, a group of authors including Game of Thrones writer George RR Martin are suing OpenAI for what they describe as “systematic theft on a mass scale”.

OpenAI has previously argued that while it respects content creators and owners, it also subscribes to a doctrine of “fair use” and that it believes that “legally, copyright law does not forbid training”.

GenAI training’s blurred lines surrounding copyright and plagiarism is increasingly becoming central to the conversation around the technology.

Image generation company Midjourney recently saw a spreadsheet containing the names of thousands of artists that have allegedly been used to train its tech go viral. The list includes the names of more than 4,700 artists whose works are said to have been ‘scraped’ to train the company’s tech, with thousands more listed under a ‘proposed additions’ tab.

The spreadsheet quickly spread across social media during the holiday period. One notable poster was Jon Lam, a senior storyboard artist at League of Legends-owner Riot Games, who posted screenshots from Discord where Midjourney developers, in his words, discuss “laundering” and creating a database from which they can train the software.

One of the messages reads: "All you have to do is just use those scraped datasets and then conveniently forget what you used to train the model. Boom legal problems solved forever."



Share Story:

Recent Stories


The future-ready CFO: Driving strategic growth and innovation
This National Technology News webinar sponsored by Sage will explore how CFOs can leverage their unique blend of financial acumen, technological savvy, and strategic mindset to foster cross-functional collaboration and shape overall company direction. Attendees will gain insights into breaking down operational silos, aligning goals across departments like IT, operations, HR, and marketing, and utilising technology to enable real-time data sharing and visibility.

The corporate roadmap to payment excellence: Keeping pace with emerging trends to maximise growth opportunities
In today's rapidly evolving finance and accounting landscape, one of the biggest challenges organisations face is attracting and retaining top talent. As automation and AI revolutionise the profession, finance teams require new skillsets centred on analysis, collaboration, and strategic thinking to drive sustainable competitive advantage.