News

AIs can generate near-verbatim copies of novels from training data

LLMs memorize more training data than previously thought.

Melissa Heikkilä, Financial Times · February 24, 2026 · 1 min read · 7 views

#AI #Policy #AI jailbreak #copyright #LLM training #syndication

LLMs memorize more training data than previously thought.

Executive Summary

The article reveals that Large Language Models (LLMs) have the capacity to memorize and generate near-verbatim copies of novels from their training data, surpassing previous expectations. This finding has significant implications for copyright law, data privacy, and the development of AI systems. As LLMs continue to advance, it is essential to address the potential consequences of their ability to reproduce substantial amounts of training data. The discovery raises important questions about the balance between innovation and intellectual property protection, highlighting the need for further research and discussion on the topic.

Key Points

▸ LLMs can generate near-verbatim copies of novels from training data
▸ LLMs memorize more training data than previously thought
▸ Implications for copyright law and data privacy

Merits

Advancements in AI

The discovery showcases the impressive capabilities of LLMs and their potential to revolutionize various industries

Demerits

Copyright Infringement

The ability of LLMs to generate near-verbatim copies of novels raises concerns about copyright infringement and the potential for AI systems to violate intellectual property rights

Expert Commentary

The article's findings underscore the complex interplay between technological advancements and legal frameworks. As LLMs continue to evolve, it is crucial to strike a balance between promoting innovation and protecting intellectual property rights. The development of effective measures to prevent copyright infringement and ensure data privacy will be essential in mitigating the risks associated with these powerful AI systems. Furthermore, policymakers must prioritize the creation of adaptable and responsive regulatory frameworks that can address the emerging challenges posed by LLMs.

Recommendations

✓ Developers should implement robust measures to detect and prevent copyright infringement
✓ Policymakers should establish clear guidelines and regulations for the development and deployment of LLMs

Sources

Ars Technica - Tech Policy

Something extraordinary is coming.

AIs can generate near-verbatim copies of novels from training data

AI Commentary

Executive Summary

Key Points

Merits

Advancements in AI

Demerits

Copyright Infringement

Expert Commentary

Recommendations

Sources

Related Articles

Google looks to tackle longstanding RCS spam in India — …

OpenAI reveals more details about its agreement with the Pentagon

Anthropic’s Claude rises to No. 1 in the App Store …

SaaS in, SaaS out: Here’s what’s driving the SaaSpocalypse

JCG, PC

HSOLLC Co., Ltd.