Fair framework conditions for text and data mining
German speaking writers and translators demand opt-out and obligation to remuneration for text and data mining.
It is unacceptable that authors, whose work makes it possible to develop “competitive” AI products in the first place, are placed in a significantly worse legal and economic position than the distributors and developers of those products that are created from Authors‘ works.
Berlin, 26 February 2021
With regard to the planned regulatory proposals on text and data mining (TDM), the Authors’ Rights Network calls for an urgently needed correction of the remuneration obligation, which has not been developed to date, and also for the facilitation of an objection to the use of one’s own data (opt-out). The political will otherwise deliberately places human authors who provide the basic services for the imitative and content-unreflected programmes (so-called weak artificial intelligence, AI) in a worse position than those who are supposed to profit from these services.
Section 44b (1) UrhG (new) currently defines TDM as the automated analysis of works in order to “gain information especially about patterns, trends and correlations”.
TDM is a relatively young concept and originates from market research strategies of 1989/1990 to establish a “one-to-one” relationship with a potential customer. Accordingly, Section 44b (1) facilitates AI-based market research for target advertising. This may include, for example, the automatic content-based and thus unlawful evaluation of publicly available copyright protected texts in order to produce economic and advertising products based on their semantic and content pattern repetitions (such as Read-O) .
The possibilities for use of copyright-protected material within the framework of TDM proposed as implementation of Directive (EU) 2019/790 (“DSM-Directive”) have long since gone far beyond mere information gain. In addition, even for the areas in which a remuneration obligation is possible under the DSM-RL (recital 17), no remuneration is provided for in the current cabinet draft, despite the possibility of creating one.
Section 44 (2) UrhG (new) also states: “Reproductions of lawfully accessible works for TDM are permissible. The reproductions shall be deleted when they are no longer required for text and data mining.” However, in order to analyse text works, a machine-readable corpus must be produced. This is done by digitising and reproducing the works. Mass use of text works increases the risk of illegal acquisition. The storage of copies is unlikely to be controlled and deletion cannot be guaranteed.
In the field of academic publishing, the Federal Association of Digital Publishers, among others, has already noted a significant increase in online piracy, as well as the increased emergence of so-called shadow libraries. For this copying process, too, there is therefore an absolute need for (a) a remuneration obligation, (b) an opt-out/use reservation and (c) a comprehensible possibility of control over the whereabouts of the reproductions as well as clear rules of “necessity”, even if these are made available to libraries, museums and archives for storage.
The new version demonstrates the political will of the German legislator to place authors at a continuing and in the long run highly damaging disadvantage by means of a remuneration-free exception for the supposed “good of the community” and of research. At the same time, the legislator fails to recognise the economic consequences of already active areas of application of TDM products and their destructive effects on the future of authors of all kind.
Explanations and background
For years, lucrative economic products have been in use that have “learned” from copyrighted works and are now gradually replacing certain copyright services:
(a) the automated production of texts (stock market news, sports and weather news, journalistic texts, automated comments and chat bots that simulate customer conversations or opinions via fake accounts in social media, police news, corporate communications);
(b) the automated production of translations (machine translation MT; search results, books, legal texts, product descriptions, translations on Google, Facebook, Twitter);
(c) imitative image and video works (deep fake );
(d) the automated creation of melodies and shorter musical works (advertising jingles);
(e) as well as text analysis tools (“dictionaries”), which are provided for a fee by universities to business companies, for example, to improve style, value expression, text and rhetoric for product descriptions, marketing information, public speeches, etc. – or even to pre-sort and screen literature (QualiFiction ).
The usability of copyrighted works for TDM, with regard to the development of artificial intelligence products, consequently represents “learning” material and thus the most important basis for the actually not particularly intelligent but computationally strong AI applications and software in the text‑ , image‑ and music area. Research consistently works to the benefit of the economy or is financed by it anyway. At the same time, the effects on the book, image and music sectors are foreseeable, where AI products are already growing into commercial competition – competition for the very people whose creative services are supposed to educate them for free, according to the current wishes of the legislator.
In its current draft, the German Commission of Inquiry on Artificial Intelligence has clearly formulated that copyright-protected works are the basis of significant economic exploitation: “Access to the raw material data for the application of AI thus influences the competitive situation in digital markets” and emphasises: “Extensive regulations and associated legal ambiguities can make access to data for the purpose of scientific and economic use more difficult, but these are a prerequisite for a competitive application of machine learning. 
In the almost 800-page draft report, we read how authors, translators and creators are, in a sense, becoming organ donors for the economy: “The data generated in the field and the models created from it with machine learning are then in turn part of the data economy, and can thus in turn represent further sources of income. AI systems such as machine translation are already an important component of globalisation today, as can be seen with the large internet platforms. Language barriers are nevertheless still noticeable in the European single market, and multilingualism also still poses a challenge for AI and data processing, such as when generating user models in e-commerce. Language barriers are sometimes described as the biggest barriers to trade. Here, the economy can benefit greatly from consistent digitalisation. Measures to reduce language barriers are already being tested and should also be promoted at the political level or by the state, e.g. translation of customer enquiries and product descriptions into easily understandable language.” 
Current efforts by the Committee of Legal Affairs for a European Parliament resolutions aim to ensure that “technical creations generated by AI technology must be protected under the intellectual property legal framework in order to encourage investment in this form of creation”. This allows for the interpretation that those products that arise from legally uncompensated TDM should enjoy a protective right, while the human “mines” from which the gold of tomorrow’s data economy will be mined are practically expropriated by the legislator in advance.
Authors: Second-class citizens?
The answer is: no! Accordingly, it is unacceptable in this cabinet draft that the authors, through whose efforts “competitive” AI products can be developed in the first place, are placed in a legally and economically significantly worse position than the users and distributors as well as those products that are created from their works.
Weak Artificial Intelligences, their analysis and automated programmes are forward-looking technologies. Despite their benefits and advantages, the further development of the technology cannot be screened and monitored exclusively from an ethical point of view, but must also ensure that the development does not only serve the economic interests of a few. In a liberal market economy, it should not, indeed must not, be permitted to make products of a certain group (here authors, translators and other creative authors) available to others free of charge so that they can make a profit from them.
This situation is tantamount to expropriation and destroys the basis of the system of a fairly conceived society that develops ethical and democratic standards of value under the aspect of equal treatment.
This is the moment for you, as decision-makers, to set the course for the future with integrity and fairness towards the performance that makes artificial intelligence possible:
Accordingly, we writers and translators demand an opt-out, and an obligation to pay remuneration in the TDM.
We remain at your disposal for questions and explanations.
- https://read-o.com, retrieved 24.2.2021 ↑
- https://www.bmjv.de/SharedDocs/Gesetzgebungsverfahren/Stellungnahmen/2020/Downloads/110620_Stellungnahme_BDZV-VDZ_RefE_Urheberrecht-ges.pdf?__blob=publicationFile&v=2, accessed 24.2.2021 ↑
- https://mixed.de/geschichte-der-deepfakes-so-rasant-geht-es-mit-ki-fakes-voran/, retrieved 26.2.2021 ↑
- https://www.qualifiction.info, retrieved 26.2.2021 ↑
- https://dip21.bundestag.de/dip21/btd/19/237/1923700.pdf, page 71, retrieved 12.2.2021, ↑
- https://dip21.bundestag.de/dip21/btd/19/237/1923700.pdf ↑
- https://dip21.bundestag.de/dip21/btd/19/237/1923700.pdf, page 225, retrieved 16.02.2021 ↑
- https://www.europarl.europa.eu/doceo/document/A-9-2020-0176_EN.pdf ↑