AI Training in a Legal Patchwork - Examining Generative AI Training on Images Under EU Copyright, Data Protection, and Personality Rights
(2025) JURM02 20252Department of Law
Faculty of Law
- Abstract
- This thesis examines the scope and practical functioning of the EU legal framework governing the use of image datasets for the training of generative artificial intelligence (GenAI) models, with particular focus on copyright law, and its interaction with privacy, and personality rights. The rapid development of GenAI systems capable of generating high-quality images has intensified legal uncertainty surrounding large-scale data-driven training practices, which frequently rely on reproductions of copyrighted works and the processing of personal data.
The analysis demonstrates that the training of GenAI models on images fall within the text and data mining (TDM) exceptions set out in Articles 3 and 4 of the Directive on Copyright in the... (More) - This thesis examines the scope and practical functioning of the EU legal framework governing the use of image datasets for the training of generative artificial intelligence (GenAI) models, with particular focus on copyright law, and its interaction with privacy, and personality rights. The rapid development of GenAI systems capable of generating high-quality images has intensified legal uncertainty surrounding large-scale data-driven training practices, which frequently rely on reproductions of copyrighted works and the processing of personal data.
The analysis demonstrates that the training of GenAI models on images fall within the text and data mining (TDM) exceptions set out in Articles 3 and 4 of the Directive on Copyright in the Digital Single Market (CDSM Directive). This conclusion is supported by the broad definition of TDM in the directive and reinforced by the Artificial Intelligence Act’s explicit reference to the opt-out mechanism in Article 4. However, the thesis shows that significant interpretative and practical uncertainties remain. In particular, unresolved questions concerning lawful access, data retention, and the technical feasibility of effective opt-outs undermine the predictability and workability of the framework, especially in the context of large-scale AI training.
The thesis further assesses the compatibility of the TDM exceptions with the international three-step test. While Articles 3 and 4 CDSM are, in abstract terms, capable of satisfying the test’s requirements, their application to GenAI training raises concerns where training practices risk interfering with the normal exploitation of works or disproportionately affecting rightsholders’ legitimate interests. The lack of judicial guidance on how the purpose of GenAI models should factor into this assessment contributes to ongoing legal uncertainty.
Beyond copyright, the thesis analyses the role of personality and privacy rights in regulating AI training. It concludes that personality rights offer limited and uneven protection at the training stage, largely due to their fragmented national regulation and frequent reliance on publication or commercial use thresholds. As a result, the protection of individuals whose images are included in training datasets is primarily mediated through EU data protection law. The General Data Protection Regulation (GDPR) imposes an additional, cumulative layer of constraints, requiring lawful processing, necessity, and proportionality, most often assessed through the legitimate interest ground. While GenAI training may, in many cases, rely on legitimate interests, the technical characteristics of AI systems complicate compliance with data subject rights such as erasure, objection, and access.
Taken together, the thesis shows that the EU currently regulates GenAI training through the parallel application of legal frameworks developed for different purposes and without GenAI in mind. Although both copyright law and data protection law contain internal balancing mechanisms, their cumulative operation produces fragmentation, legal uncertainty, and uneven protection at the training stage. The thesis concludes that further judicial clarification or targeted regulatory development is necessary to ensure a coherent and workable balance between innovation, copyright protection, and fundamental rights in the context of generative AI training. (Less)
Please use this url to cite or link to this publication:
http://lup.lub.lu.se/student-papers/record/9217041
- author
- Axelsson, Sara LU
- supervisor
-
- Ana Nordberg LU
- organization
- course
- JURM02 20252
- year
- 2025
- type
- H3 - Professional qualifications (4 Years - )
- subject
- keywords
- EU law, copyright, AI, privacy rights, personality rights
- language
- English
- id
- 9217041
- date added to LUP
- 2026-01-21 10:50:22
- date last changed
- 2026-01-21 10:50:22
@misc{9217041,
abstract = {{This thesis examines the scope and practical functioning of the EU legal framework governing the use of image datasets for the training of generative artificial intelligence (GenAI) models, with particular focus on copyright law, and its interaction with privacy, and personality rights. The rapid development of GenAI systems capable of generating high-quality images has intensified legal uncertainty surrounding large-scale data-driven training practices, which frequently rely on reproductions of copyrighted works and the processing of personal data.
The analysis demonstrates that the training of GenAI models on images fall within the text and data mining (TDM) exceptions set out in Articles 3 and 4 of the Directive on Copyright in the Digital Single Market (CDSM Directive). This conclusion is supported by the broad definition of TDM in the directive and reinforced by the Artificial Intelligence Act’s explicit reference to the opt-out mechanism in Article 4. However, the thesis shows that significant interpretative and practical uncertainties remain. In particular, unresolved questions concerning lawful access, data retention, and the technical feasibility of effective opt-outs undermine the predictability and workability of the framework, especially in the context of large-scale AI training.
The thesis further assesses the compatibility of the TDM exceptions with the international three-step test. While Articles 3 and 4 CDSM are, in abstract terms, capable of satisfying the test’s requirements, their application to GenAI training raises concerns where training practices risk interfering with the normal exploitation of works or disproportionately affecting rightsholders’ legitimate interests. The lack of judicial guidance on how the purpose of GenAI models should factor into this assessment contributes to ongoing legal uncertainty.
Beyond copyright, the thesis analyses the role of personality and privacy rights in regulating AI training. It concludes that personality rights offer limited and uneven protection at the training stage, largely due to their fragmented national regulation and frequent reliance on publication or commercial use thresholds. As a result, the protection of individuals whose images are included in training datasets is primarily mediated through EU data protection law. The General Data Protection Regulation (GDPR) imposes an additional, cumulative layer of constraints, requiring lawful processing, necessity, and proportionality, most often assessed through the legitimate interest ground. While GenAI training may, in many cases, rely on legitimate interests, the technical characteristics of AI systems complicate compliance with data subject rights such as erasure, objection, and access.
Taken together, the thesis shows that the EU currently regulates GenAI training through the parallel application of legal frameworks developed for different purposes and without GenAI in mind. Although both copyright law and data protection law contain internal balancing mechanisms, their cumulative operation produces fragmentation, legal uncertainty, and uneven protection at the training stage. The thesis concludes that further judicial clarification or targeted regulatory development is necessary to ensure a coherent and workable balance between innovation, copyright protection, and fundamental rights in the context of generative AI training.}},
author = {{Axelsson, Sara}},
language = {{eng}},
note = {{Student Paper}},
title = {{AI Training in a Legal Patchwork - Examining Generative AI Training on Images Under EU Copyright, Data Protection, and Personality Rights}},
year = {{2025}},
}