LIBRARY OF CONGRESS
U.S. Copyright Office
101 Independence Ave. S.E.,
Washington, D.C. 20559-6000
[Docket No. 2023–6; COLC–2023–0006]
Artificial Intelligence and Copyright: Call for Comments
Submitted via: https://www.regulations.gov/commenton/COLC-2023-0006-0001
With reference to https://www.govinfo.gov/content/pkg/FR-2023-08-30/pdf/2023-18624.pdf as well as
the extension of the deadline https://www.govinfo.gov/content/pkg/FR-2023-09-21/pdf/2023-20480.pdf
Brussels, 30th of October 2023 I The European Writers’ Council (EWC) is the world’s only and largest representation of writers in the book sector and of all genres (fiction, non-fiction, academic, children’s books, poetry, etc.). With 49 organisations and professional guilds from 31 countries of the EU, the EEA and of non-EU areas, the EWC represents 220.000 writers and translators.
These individuals write and publish in 34 languages, including in the original or as a translation in the US territories and corresponding legal frameworks.
For them, as well as for the book writers affected worldwide and in Europe, who are directly and immediately hit by the consequences of so-called “AI” in the book sector and in particular by the production and use of generative informatics, we have a corresponding duty to respond on their behalf to this important initiative of the U.S. Copyright Office.
To the full EWC contribution (21 pages, pdf)
Introduction
Generative, analytical and assistive informatics, sub-areas of so-called artificial “intelligence”, threatens numerous jobs and fields of application in the book sector. Synthetic text and image generators, synthetic robot voice cloning and algorithmic informatics to analyse text and data, will replace some professions by machines in the medium term; be it in the areas of text, editing, proofreading, production, cover design, illustration, translation, selection and editing of original and translated manuscripts, audio book production or in the promotion and distribution of books.
Already, numerous criminal and damaging “AI business models” have threatened the book sector – with fake authors, fake books and also fake readers. It has been demonstrated that the foundations for major language models such as GPT, Meta, StableLM, BERT have been generated from book works whose sources are shadow libraries such as Library Genesis (LibGen), Z-Library (Bok), Sci-Hub and Bibliotik – bit torrent piracy sites.
Without legal regulation, generative, algorithmic, synthetic, and reproducing technologies accelerate and enable the expansion of exploitation, legitimisation of copyright infringement, climate harm, discrimination, information and communication distortion, identity theft, reputational damage, blacklisting, royalty fraud and collective licensing remuneration fraud.
At the same time, a close look and assessment is needed to categorise and regulate the individual aspects of advanced informatics; because not all smart software is “AI”, not every application is equally risky. Assisting or analysing software of advanced informatics are already widely used with the book sector – for accounting, logistics, meta data management, citation glossaries, anti-plagiarism control or editing workflow.
(1b) For the public or the users of applications: generative text informatics is a high-risk communicator and unreliable source of information.
“Hallucinating” is the vocabulary used to describe generative text systems[1] that completely invent or incorrectly plug together data, events[2], court decisions[3] or biographies, contradict themselves when asked questions, or need to be constantly corrected by users with reinforcement learning from human feedback (RHLF)[4].
In the process, users teach the system what its developers did not. At the same time, generative text software makes it easier for actors such as propaganda farms to rapidly and cheaply spread disinformation and hate speech; and creates fake authors who flood social networking platforms or market players such as Amazon[5] with GPT output and artificial communication[6]. The lack of or inadequate security checks to save costs[7] and the lack of test and correction series prior to publication mean that generative text informatics must be assessed as fundamentally untruthful. At the same time, however, the “faith” and lack of sensitivity towards artificial content of many of the over 100 million users are so high that they do not recognise these hallucinations – or do not even suspect that the output is false or fictitious. Basically, GAI needs original, “fresh”, human texts in order not to go crazy, as a study from Stanford University found out: If synthetic content (AI output) is used as training[8], the system collapses.
(1c) “GAI” (re)produces bias and reinforces intersectional discrimination [9], [10].
Stable Diffusion, an image-generating (“text to image”) computer science, knows no black members of parliament, no female doctors, and poses as cleaners basically Asian women. Text (re-)generators reproduce sexist and gender-stereotypes – as they draw on material that comes from a particular more Western, male, white-oriented canon[11] or “learned” misogyny from the comment sections of the internet. A bias (false prejudice) can refer not only to gender or skin color; but to places, ages, social classes, professions, medical conditions, cultures or the classification of facts, of concepts such as “success” or “happiness” – and political opinion.
Effect: Users of a synthetic, algorithmic, (re)generating text software adopt the bias[12] and reinforce it. As a result, people are pigeonholed even more quickly and, above all, unquestioningly, which can have an impact on social and professional access, education, housing, health care and creditworthiness.
[1] https://www.beamex.com/resources/for-a-safer-and-less-uncertain-world/generative-ai/
[2] https://www.nytimes.com/2023/05/01/business/ai-chatbots-hallucination.html
[3] https://www.morningbrew.com/daily/stories/2023/05/29/chatgpt-not-lawyer?mbcid=31642653.1628960&mblid=407edcf12ec0&mid=964088404848b7c2f4a8ea179e251bd1&utm_campaign=mb&utm_medium=newsletter&utm_source=morning_brew
[4] https://www.telusinternational.com/insights/ai-data/article/rlhf-advancing-large-language-models
[5] https://www.vice.com/en/article/v7b774/ai-generated-books-of-nonsense-are-all-over-amazons-bestseller-lists
[6] https://www.independent.co.uk/tech/ai-author-books-amazon-chatgpt-b2287111.html
[7] https://www.nytimes.com/2023/04/07/technology/ai-chatbots-google-microsoft.html
[8] https://futurism.com/ai-trained-ai-generated-data
[9] https://fra.europa.eu/sites/default/files/fra_uploads/fra-2022-bias-in-algorithms_en.pdf
[10] https://www.bloomberg.com/graphics/2023-generative-ai-bias/
[11] https://crfm.stanford.edu/2023/06/15/eu-ai-act.html
[12] https://www.nyu.edu/about/news-publications/news/2022/july/gender-bias-in-search-algorithms-has-effect-on-users–new-study-.html