© 2024 Blaze Media LLC. All rights reserved.
Report: AI chatbots respond with 'outright repetition of copied' news articles
Photo by LIONEL BONAVENTURE/AFP via Getty Images

Report: AI chatbots respond with 'outright repetition of copied' news articles

A Tuesday report from trade association News Media Alliance claimed that artificial intelligence chatbots are responding to users' questions with "paraphrasing or outright repetition of copied" news articles and other copyrighted work, the New York Post reported.

The NMA is a nonprofit that represents more than 2,200 news outlets globally.

The 77-page report stated that large language models, including Microsoft's Bing Chat, Google's Bard, and OpenAI's ChatGPT, are being trained on copyrighted news articles and, in some cases, repeating those pieces word for word in their responses to queries.

"As with past 'disruptive' Silicon Valley models, [generative artificial intelligence] investors are banking on forgiveness instead of asking permission. They depend on the claim that copying for training is a 'fair use' that they may continue with impunity, even as many of their products directly compete with and threaten the continued well-being of publishers," the NMA's report stated. "But fair use does not work this way."

The NMA's white paper noted that LLMs are trained by copying large amounts of others' creative works. This is often done without approval or compensation, the organization added. Additionally, the nonprofit's analysis of the AI systems showed that developers disproportionately train their LLMs using online news articles and similar content.

"In fact, our analysis of a representative sample of news, magazine, and digital media publications shows that the popular curated datasets underlying some of the most widely used LLMs significantly overweight publisher content by a factor ranging from over 5 to almost 100 as compared to the generic collection of content that the well-known entity Common Crawl has scraped from the web," the report said.

Half of the top ten websites used to train Google's Bard are news outlets, it added.

While GAI developers claim that their LLMs are "just 'learning' unprotectable facts from copyrighted training materials," the NMA says that is "technically inaccurate" because the systems retain the facts without actually understanding the underlying concepts. Additionally, the NMA argued, "It is beside the point because materials that are used for 'learning' are subject to copyright law." The report noted that libraries must legally acquire a book before allowing individuals to borrow it, which does not grant the borrower the right to copy the book's material.

The NMA offered several recommendations to GAI developers, including informing publishers when their content has been copied and used to train AI chatbots. Developers who use content without authorization must be recognized as having violated the publishers' rights, the report stated.

The organization called on lawmakers and the United States Copyright Office to encourage content licensing for publishers. The NMA said it "advocates the passage of legislation it has proposed allowing news publishers to bargain collectively with certain dominant technology providers."

Google, OpenAI, and Microsoft did not respond to the Post's request for comment.

Like Blaze News? Bypass the censors, sign up for our newsletters, and get stories like this direct to your inbox. Sign up here!

Want to leave a tip?

We answer to you. Help keep our content free of advertisers and big tech censorship by leaving a tip today.
Want to join the conversation?
Already a subscriber?
Candace Hathaway

Candace Hathaway

Candace Hathaway is a staff writer for Blaze News.
@candace_phx →