This map of what happens when you interact with a digital assistant such as the Amazon Echo is incredible. The image is taken from a length piece of work which is trying to bring attention towards the hidden costs of using such devices.

With each interaction, Alexa is training to hear better, to interpret more precisely, to trigger actions that map to the user’s commands more accurately, and to build a more complete model of their preferences, habits and desires. What is required to make this possible? Put simply: each small moment of convenience – be it answering a question, turning on a light, or playing a song – requires a vast planetary network, fueled by the extraction of non-renewable materials, labor, and data. The scale of resources required is many magnitudes greater than the energy and labor it would take a human to operate a household appliance or flick a switch. A full accounting for these costs is almost impossible, but it is increasingly important that we grasp the scale and scope if we are to understand and govern the technical infrastructures that thread through our lives.

It’s a tour de force. Here’s another extract:

When a human engages with an Echo, or another voice-enabled AI device, they are acting as much more than just an end-product consumer. It is difficult to place the human user of an AI system into a single category: rather, they deserve to be considered as a hybrid case. Just as the Greek chimera was a mythological animal that was part lion, goat, snake and monster, the Echo user is simultaneously a consumer, a resource, a worker, and a product. This multiple identity recurs for human users in many technological systems. In the specific case of the Amazon Echo, the user has purchased a consumer device for which they receive a set of convenient affordances. But they are also a resource, as their voice commands are collected, analyzed and retained for the purposes of building an ever-larger corpus of human voices and instructions. And they provide labor, as they continually perform the valuable service of contributing feedback mechanisms regarding the accuracy, usefulness, and overall quality of Alexa’s replies. They are, in essence, helping to train the neural networks within Amazon’s infrastructural stack.

Well worth a read, especially alongside another article in Bloomberg about what they call ‘oral literacy’ but which I referred to in my thesis as ‘oracy’:

Should the connection between the spoken word and literacy really be so alien to us? After all, starting in the 1950s, basic literacy training in elementary schools in the United States has involved ‘phonics.’ And what is phonics but a way of attaching written words to the sounds they had been or could become? The theory grew out of the belief that all those lines of text on the pages of schoolbooks had become too divorced from their sounds; phonics was intended to give new readers a chance to recognize written language as part of the world of language they already knew.

The technological landscape is reforming what it means to be literate in the 21st century. Interestingly, some of that is a kind of a return to previous forms of human interaction that we used to value a lot more.

Sources: Anatomy of AI and Bloomberg