Instead of promoting a exception for Text and Data Mining, misinterpreted as “AI use” and together with a risky TDM opt-out regime, UK Government shall address the lack of transparency and licensing by AI providers, EWC pleas.
In its response to the UK government’s extensive consultation questionnaire, the EWC urged the UK not to follow the disastrous example of the European Union by creating a new exception in the current law for the benefit of AI companies and to the detriment of UK authors.
“Balancing” copyright always meant, to find a compromise between copyright and public interest for a common good, as laid down e.g. in Berne Convention, EWC said in its introduction of the overall 47-pages paper to UK Government.
Now, the implementation of a commercial exception, is not meant for the common good, but solely for profit of a few businesses. The need of rightsholders is to always first give explicit autorisation, negotiate a remuneration, and get full transparency on the use of their work.
Instead of promoting a complicated exception, Government shall address the lack of transparency and licensing by AI providers. This would serve the real needs of authors, and not be stripped off two times in a row: first by the Ai developers since serval years, using their works without permission, and secondly by their own Government, making it even more easy for AI developers to serve themselves for their over proportional financial benefit.
As Nina George, Political Commissioner of the EWC and responsible for the Copyright & AI dossiers within the federation, already pointed out in a related Guardian article:
“These (misinterpreted) AI exceptions for commercial use mean that business interest will be served for the first time,” she said. “This is a shift of paradigms [and] a perverted way to bend copyrights and authors’ rights to serve the interest of a few businesses.”
This was echoed by MEP Axel Voss, claiming in the same article by Jennifer Rankin for The Guardian: (…) this exemption from copyright law was intended to have a limited private use, rather than allow the world’s largest companies to harvest vast amounts of intellectual property. The introduction of the TDM exemption in the AI Act was “a misunderstanding”, he said. This view was reinforced by a significant academic study last year by the legal scholar Tim Dornis and the computer scientist Sebastian Stober, which concluded that the training of generative AI models on published materials could not be considered “a case of text and data mining” but “copyright infringement”.
A misinterpretation of the TDM exception that costs authors and their publishing partners millions – only to serve mostly non-European AI oligopolies
As the EWC stated in its consultation response to the UK:
The scope of TDM as laid down in the summary assessment of options as well as in the introduction executive summary, is fundamentally wrong and misinterpreted.
TDM is in no way to be equated with the production methods for GPAI or generative AI, neither technically nor legally. Recent studies (e.g. but not limited to Dornis and Stober Paper on AI vs European Intellectual Property Review ) and litigations, as well as worldwide movement of law provisions (Australia) proved, that the procedures used within TDM, are technically and legally not the same as to the processes used to collect, prepare, reproduce, store and memorise within (Gen)AI development. During the training of a generative AI model, a number of acts of copying and reproduction of copyright-protected works occur:
(1) It starts with the collection, preparation, and storage of copyrighted works used for the AI training process.
(2) In addition, during pre-training and fine-tuning, copyright-relevant reproductions of copyrighted works materialise “inside” the AI model. This also constitutes a copy and replication in the legal sense.
(3) Furthermore, during the application of generative AI models, particularly by the end users of the fully trained AI systems (e.g., ChatGPT via the OpenAI website), works that have been used for training the AI model may be copied and replicated as part of the systems’ output.
(4) This leads to the making available of works by generative AI models and works that have been implemented in AI systems. A making available to the public of the works replicated “inside” the generative AI models without consent by the original author is infringement.
(5) The training of generative AI models does not limit the use of the training data to a simple analysis of the semantic information contained in the works. It extracts the syntactic information in the works, including the elements of copyright-protected expression. This comprehensive utilisation results in a representation of the training data in the vector space of the AI models and thus in a copying and reproduction in the legal sense. Consequently, the training of generative AI models does not fall under TDM (text and data mining).
Gen-AI training and TDM are distinct activities, confirmed also Canadian courts, who are considering copyright infringement claims should avoid conflating these different activities. The separability of TDM from actual gen-AI model training is key to determining whether copyright infringement arises from the use of copyright-protected materials at any stage during the AI development process. Despite the fact that TDM can be an essential component of developing a gen-AI training dataset, that is not its sole purpose. Rather, humans have employed TDM since the 1980s as a method of gathering and analyzing large datasets to seek out patterns, trends, and insights without any integration to AI models. In other words, TDM techniques are a practice and not a technology.
More: CIPPIC-Article: difference between tdm and training
The European Union’s CDSM Directive does not serve as an argument to mix up TDM and AI development or training.
European lawmakers did not mean to cover GPAI or generative AI use with the TDM exception 4(3) of the EU Directive 2019/790: The European Parliament, when enacting the 2019 CDSM Directive, did not foresee the technological development of creative-productive or “generative” AI models and their disruptive socioeconomic effects. The text and data mining exception(s) were specifically designed ONLY for the analysis of semantic information and under the above outlined premises that TDM is, definition by IBM: “TDM is the process of “transforming unstructured text into a structured format to identify meaningful patterns and new insights.” This generally involves the scraping and combining of data from digital sources into a tabular dataset format, and thereby focuses on extracting and analyzing existing information. A developer may also implement TDM techniques to clean and pre-process the dataset prior to the training process. This definition starkly contrasts with that of AI training, which is described by IBM as the application of a “model’s mathematical framework to a sample dataset whose data points serve as the basis for the model’s future real-world predictions.”.
Therefore, it cannot be extended to the comprehensive syntax-extracting functionality of generative AI models. The latest quote by MEP Axel Voss, rapporteur for the AI Act and shadow rapporteur for the CDSM Directive 2019, in The Guardian, underlines, that this huge misinterpretation is leaving a loophole in copyright and enforcement, putting authors under fire by AI industry: The Guardian: Devastating copyright loophole in AI-Act
Both Articles 3 and 4 of the CDSM Directive solely encompass acts of extraction and reproduction. This means that the doing of subsequent acts restricted by copyright and/or related rights shall not be within the scope of either E&L. To exemplify: if a generative AI model lawfully (e.g., because the conditions of the TDM provisions are complied with) trained on third-party protected subsequently reproduced and/or communicated/made available to the public third-party protected content, such acts would not be covered by either E&L. In fact, all ongoing AI-training since 2021 with European works and data is a breach of the purpose of the two TDM exceptions of the EU Directive. (see also: (2024) 46(5) European Intellectual Property Review 262-274, Faculty of Law, Stockholm University Research Paper No. 123)
It’s not “balance interests”, it’s stripping off fundamental rights of individuals to serve profits of a few and furthermore: non-UK and non-European companies.
The objectives seek ‘the right balance between encouraging innovation in AI in the UK while ensuring protection for creators and our vibrant industries’. This approach is repeated under (9) of the executive summary of this consultation. However, that goal would not be met by Option 3, and in fact would have the opposite effect. It would irremediably and significantly damage the creative industries and it’s individual actors, the authors, artists and performers, for the sole benefit of the tech sector dominated by non UK-oligopolies.
In first time of history in copyright law’s amendments, an exception to cut off authors’ genuine rights, would be formed not for a common good or to serve societal interests, as for example provided by Public Lending Right, by the Marrakesh Treaty for the visual impaired, or by certain provisions onto private copying and use for education or research. But solely for the financial benefit of few business companies, allowing them to easily drain up the sources of all UK’s CCS sector’s value chain, while avoiding appropriate and proportionate remuneration, and producing outputs similar to the inputs, and therefore pushing into the same market of authors and the CCS, replacing humans against machine artefacts and plagiarism products. This is not “balancing interests” and not “balancing needs”, to install therefore an exception to make it easy for business and profit, while cutting of a human right not only from creators – but of all citizens, as also not only published cultural or media works are ingested, but also social and private texts, pictures; furthermore reviews, private videos, etc.
“Balancing” copyright always meant, to find a compromise between copyright and public interest for a common good, as laid down eg in Berne Convention. Now, the implementation of a commercial exception, is not meant for the common good, but solely for profit of a few businesses. The need of rightsholders is to always first give explicit autorisation, negotiate a remuneration, and get full transparency on the use of their work.
Instead of promoting a complicated exception, UK Government shall address the lack of transparency and licensing by AI providers. This would serve the real needs of authors, and not be stripped off two times in a row: first by the Ai developers since serval years, using their works without permission, and secondly by their own Government, making it even more easy for AI developers to serve themselves for their over proportional financial benefit.
Transparency shall be given already – no exception is needed
The assumption (Point 11) that transparency on used training data aka content and works, could only be created through a limitation paired with a rights reservation system is substantially misguided.
Even now, any AI provider, developer, collector, curator, content set fine-tuner, etc., could provide concrete title-specific information on which works are used. Individual or collective licenses, e.g. also supported by a direct remuneration claim right for authors, could have been acquired a long time ago (as these generative models were built and tested years before launched in the market in Nov 2022), and without the diversions via a legal system that offers no guarantee that the intential lack of transparency on the part of AI developers will change – as can currently be seen in Europe and the articulated reluctance of AI companies involved in the CoP to shy away from the “administrative effort”. In addition, numerous AI developers openly admit that they have no interest in licenses as long as the court disputes in which they refer to the US fair use doctrine, for example, remain open. Any licensing efforts would destroy their argumentation. At the same time, it has also become apparent in the consultations of the US Copyright Office that the major AI developers, i.e. Open Ai, Meta, Anthropic, Alphabet and Google, see license fees, which can easily sum up to billions of Dollars and Pounds, as a obstacle to their business model. The newly released court documents in the Kadrey v. Meta Platforms case have brought to light previously unknown details about Meta’s book licensing program, but also confirmed, that Meta, like other big players, served themselves with pirated books from illegal portals, e.g. Anna’s Archive or Library Genesis.
- The UK Consultation was open for comments until February, 25th 2025 under UK consultations: copyright and artificial intelligence
- EWC response