Copyright and AI training data - transparency to the rescue?

Research output: Contribution to journalArticlepeer-review

Abstract

Generative AI models must be trained on vast quantities of data, much of which is composed of copyrighted material. However, AI developers frequently use such content without seeking permission from rightsholders, leading to calls for requirements to disclose information on the contents of AI training data. These demands have won an early success through the inclusion of such requirements in the EU’s AI Act.

This paper argues that such transparency requirements alone cannot rescue us from the difficult question of how best to respond to the fundamental challenges generative AI poses to copyright law. This is because the impact of transparency requirements is contingent on existing copyright laws; if these do not adequately address the issues raised by generative AI, transparency will not provide a solution. This is exemplified by the transparency requirements of the AI Act, which are explicitly designed to facilitate the enforcement of the right to opt-out of text and data mining under the CDSM Directive. Because the transparency requirements do not sufficiently address the underlying flaws of this opt-out, they are unlikely to provide any meaningful improvement to the position of individual rightsholders.

Transparency requirements are thus a necessary but not sufficient measure to achieve a fair and equitable balance between innovation and protection for rightsholders. Policymakers must therefore look beyond such requirements and consider further action to address the complex challenge presented to copyright law by generative AI.
Original languageEnglish
JournalJournal of Intellectual Property Law & Practice
Early online date12 Dec 2024
DOIs
Publication statusPublished online - 12 Dec 2024

Fingerprint

Dive into the research topics of 'Copyright and AI training data - transparency to the rescue?'. Together they form a unique fingerprint.

Cite this