Essay by Nina George | 4 August 2021
Artificial intelligence is pretty dumb
Artificial intelligence, the beautiful monster: fictionalised for the first time by Stanley Kubrick in 1968. HAL, the neurotic computer of the spaceship Discovery on its journey to Jupiter in “Odyssey 2001” is highly qualified, superior to humans in every computing power, and capable of forming a consciousness. Fearful of being shut down, HAL assassinates the crew of the Discovery. Until it is manually deactivated, and its “mind” infantilises and shrinks down to a childlike organism, as helpful and harmless as a slide rule.
The good news: HAL won’t happen any time soon. Neither Siri nor Alexa are planning to wipe out humanity at night, just because one of them is fed up with people asking her about the meaning of life, and the other is fed up with having to play wishy-washy pop from streaming flat rates whose revenues for musicians are just below immoral. Nor is it foreseeable that the fee-saving machine writer will soon be punching out one bestseller after another, after the last remaining editors in a central-world publishing house have fed him with the most promising keywords, which, thanks to the tracking of reading habits on the E-Reader, its customised to fit each reader.
Even if the GPT-3 from OpenAI/Microsoft pretends that it soon can. In various tests of this automatic text and communication spectre, however, strange things happened: In simulated conversations about the Holocaust, black people or women, the GPT-3 rattled out sexist, racist and anti-Semitic comments, and in a simulated “therapist conversation” with a depressed patient, the AI advised her that it would be best to commit suicide.
But let’s start at the beginning:
HAL is what would be called “strong artificial intelligence”, with the highest level of Emotional Intelligence (EI), which is equated with “humanoid” – as in the film “I am your human” by Maria Schrader. However, the AI that is used worldwide is exclusively so-called “weak AI”, with largely equally weak EI. Weak AI can only focus on one task, and is used, for example, in navigation systems, speech recognition, image recognition, correction suggestions in web searches, stock market tickers, weather news, product descriptions; in maintenance information in appliances (the flashing of the coffee machine descaler, or the car’s on-board computer that fades in a cup and pretends to mean: “Honey, you’ve been driving way too long, take a break.”). Weak AI simulates what we misinterpret as human intelligence, decision-making ability, knowledge, empathy or just: Consciousness, character.
In principle, we look at AI as too naïve novices; Joseph Weizenbaum noted at the end of the 1970s that we project a wisdom, an essentiality, into AI, especially into AI that “talks” to us reactively. Anyone who experienced the fuss about Tamagotchis in the 1990s – artificial chicks that “died” if they were not “cared for” – can imagine the deep emotional attachment to a product. Some people would consider the loss of their smartphone at least an amputation, if not a “loss of life”.
The term “Tamagotchi effect” refers to the “emotional intelligence” attributed to a technical product or programme – or rather: how “well” it simulates empathy, feeling, intuition and lets us emotionally dock onto the slide rule.
This interdisciplinary approach to the question: how well does AI compute on the one hand – and how high is its emotional factor in order to be respected (read: bought) by humans, is located in the intermediate area of computer science and psychology and will help shape the development of all AI products. The goal: to be able to recognise (and evaluate and re-use) human emotions. Cars with AI and medium-high EI will recognise how aggressive their driver is and adjust driving assistance systems. Next, we can expect that through camera-based facial expression recognition, speech- and text-based sentiment analysis as well as vital data (heartbeat, skin temperature), devices will create emotion profiles to manipulatively adapt an environment, with light and heat regulation, sound simulations or the hint to read a good book with a calming factor of 6… That such emotion profiles can also be useful to dictatorial regimes is written in another story, for example by Yuval Harari.
Weak AI, however, has remained pretty dumb in the text domain since the first development steps in the 1960s: it can only either translate, or analyse, or write. Here we move into Natural Language Processing (NLP), i.e. composing, translating and checking texts, and Natural Language Understanding (NLU), e.g. to convert text into speech and vice versa, as in the funny, drunken-looking automatic subtitles in Zoom, or “customer conversations” in the hotline waiting loop (“Please say one” – “Huone” – “I didn’t understand you” – “O bollox!” – “You are being connected to an employee”).
The writing AI cannot read itself. It doesn’t even understand what it’s talking about, because words are converted into formulas. So text analysis or translation AI also fails with irony or puns, and with emotion – unless it has “sentiment detection” built in, which is the ability to recognise terms with negative or positive connotations, such as “beautiful” or “dead”, although it will have difficulties there if the sentence “He was so beautifully dead” occurs in a short crime novel by Tatjana Kruse. Or stumble over it when it says in an Oberbayern crime novel: “I stayed in Bad Tölz”. Bad, what means in German Bad = Spa, means bad in English, and makes an English-trained emotional decoder rate Tölz as really, really, really horrible.
Sentiment detection lists of words are – still – built by linguists by tagging terms, for example, extremely negative (-3) or extremely positive (+3), or by determining “n-grams”, i.e. word sequences that are evaluated as positive or negative. This also has its pitfalls, because punctuation marks are deleted for the selection. Thus, “Come on, let’s cook, grandpa” becomes “Come on let’s cook grandpa”.
In the field of research, mood classification or term accumulation can reveal illuminating things – for example, a semantic analysis of the number one songs in the USA since 1958 has shown that the lyrics have become increasingly sad, profane, aggressive. The most common words in 2019 were “like”, “yeah”, “niggas”, “bitches”, “lil bitch”, “love”, “need”, “fuck”. In the Netherlands, semantic detectors were used to elicit how people talk about books by women, and how by men. Works by men are discussed from a literary point of view, works by women from the point of view that they are by a woman. Google’s Ngram viewer scans millions of illegally scanned book works and knows that the word hate appears more often since 2001 than at any time since 1800, and that “freedom” was used more often in German works around 1850 than it is today.
But for example, the spell checker, which is a simple text analysis tool that compares words with the internal dictionary, as in Microsoft Word or Apple Mail. It turned my colleague Ferdinand von Schirach into a spirited “Ferdinand von Schnarch” (snore) when I sent my email to him. His auto-responder thanked me with “e-mails are only read on Mondays”. My colleague Astrid struggles with the uneducated computer linguist of her “Papyrus”; the style-checking programme constantly criticised text quotations as “too poor language”, which came from Rilke poems, Goethe plays or Shakespeare translations.
Google Translate, which is used by half a billion people every day, doesn’t notice that it produces sexist stereotypes. The University of Porto Alegre in Brazil has put simple sentences with job titles in gender-neutral languages such as Hungarian, Turkish, Japanese and Chinese through machine translation into English. These languages do not use gender-specific personal pronouns such as “she” or “he” – to see what the web translator makes of them and which professions and characteristics it assigns to whom. Conclusion: engineers, doctors or teachers are men, hairdressers or nurses are women. So far, so conservative. Adjectives were assigned in a similar 1950s manner: Google declared courageous, cruel or successful to be male; it associated attractive, shy or friendly with female qualities.
But why actually? Until 2018, the Google translator’s learning system was taught from sample texts such as the Bible, instruction manuals, Wikipedia (90% of which are written by men and focus on male topics and personalities) or texts from the UN or EU Commission. And in these learning templates, men appear significantly more often than women.
Correspondingly yesterday’s templates lead to AI reproducing stone-age stereotypes in the text area and exhibiting racist tendencies, or, when it “learns”, as the Microsoft bot “Tay” does on Twitter, in forums or Facebook comments, it formulates in a fascist, anti-Semitic and misogynistic way. Which tells us a lot about the tone on the net, this former utopia of knowledge, understanding and wisdom. The swarm intelligence is maybe also a deep fake …
Weak text AI is only as good as the templates it is “trained” from. In order to make the quality of programmes and software-products “better”, to learn topicality, changes in the home of values, nuanced terms, debate buzzwords, to become as diverse as society is, one thing must be clear: they need professional, good, zeitgeisty texts from professionals like book authors, or journalistically brilliant works. They need us – so that we become superfluous, one could say as a pessimist. Or as a realist.
They need our free minds as mines. This is called Text and Data Mining (TDM), which, besides being an acceptable help and a blessing to science and research, is now highly relevant to commercial companies. Around the world, Oracle, Alibaba, Google, Microsoft, OpenAI, Nvidia and Amazon are working on text generators and machine translation. And the data sets for training are, at best, not the 1st Book of Moses, but current books and texts by people such as professional writers or private individuals. Everything that people type into the web can be used. And not just for text AI, but for crawlers that do opinion mining, which is used by companies or political parties to track the canon of opinion on the web. If terms with negative connotations appear around palm oil, Nutella or Kitkat knows that it will soon rain another shitstorm and will develop counter-strategies. When parties know that their candidate is not going down well, they call in their spin doctors to come up with a counter-campaign.
What emerges from the data gold? Read-O, for example. A Frankfurt-based start-up whose publicly funded AI, according to its own statement, “summarises millions of book reviews for users”. The book recommendation app was built from these analyses, on which you can use buttons to set how dramatic, exciting or factual it should be, and this then throws up an “individual” recommendation. They are now looking for partners from the book chain, so money will continue to flow. That’s totally creative, of course, but as a writer in love with authors’ rights, I have a small point to make: do the authors of the twenty million book reviews know that their work (yes, even writing reviews is tough work) has been used and turned into a monetary investment? What kind of reviews are they: on forums, on Amazon, on publishers’ sites, in press products? In which small print of the general terms and conditions of the portals or blog servers or contracts is it shown that text mining is carried out – and where does it say that? Possibly on page 17 of the app terms and conditions, which we all never read and click away?
And what about in the book sector – where are text-based AI apps being used? For example, with a summary tool so that an overworked marketing expert is not compelled to read the novel she is supposed to promote. Or a tool that compiles keywords, which is used by big publishers for previews or to brief the cover art. So far, so practical.
Other programmes, such as Scriptbook www.scriptbook.ioor QualiFiction www.qualifiction.info are analysis software that aim to predict the bestseller probability of a screenplay or novel. Parameters of the assessment are “dictionaries” with semantic assessment of terms or sentence sequences, and the “emotion curve” of the novel.
Whether the authors or publishers whose books were used as data trainers were asked for permission or monetarily involved in the flow of money that the companies recorded is not known to me. Probably not, because the motto among TDM gold suckers is: “If I can read it, I can exploit it”. This seems understandable at first: that’s how every student worked until the invention of digital text diagnostics. He read, quoted, summarised, deduced. Legally protected by the right of citation. But if this takes place on a large scale and leads to lucrative business models, and if TDM becomes free gold mining for AI products, I see the need for a new ethics and legal fine-graining that does not put the authors of the data in a worse legal position than the users and skimmers of their work.
In fact, however, on 7 June 2021, the German legislator, by transposing the Directive on Copyright in the Digital Single Market with a non-remuneration claue for TDM, opened the door to those intermediaries and their scientific assistants in research institutions who use book and press works to create competitive products that imitate and partially replace the works of human authors. The only way out is to integrate a “machine-readable opt-out” into your work. So how do authors do this in their e-books? Are publishers aware of this “rights reservation protocol”? Or do they think it’s great and are happy to sell “data sets”, regardless of whether TDM was even contractually transferred?
It is no wonder that library associations are so greedy for the cheapest possible e-lending – because they would love to have these fresh datasets used for TDM, but not with a fee for the authors, of course.
Who knows whether companies would not find the machine more practical: Translators are already being used as “post editors”, i.e. as proofreaders, to smooth out what DeepL has translated quickly, but unfortunately not well; for example, in the case of legal texts, many a dubious small publishing house.
Recently a non-fiction book by an AI was published for the first time by Springer Nature.
It would be time to consider developing a “human translated” sticker for books. And one day, far from today, an “AI Composed” warning sticker or the seal of approval: “Human Written”. I, at least, would like to know if an AI is going to tell me a story that it has cobbled together, without contributing a single new thought of its own. Or a human being who, out of an intrinsic motivation, well hidden from computational bases, has something previously unheard, incredibly to say.
Nina George is a writer and president of the European Writers’ Council, which represents 46 organisations from 31 countries.