A technology by itself is never enough. In order for it to be of use, it needs to be accompanied by other elements, such as popular understanding, good habits, and acceptance of shared responsibility for its consequences. Without that kind of societal halo, technologies tend to be used ineffectively or incompletely. A good example of this might be the mRNA vaccines created during the COVID epidemic. They were an amazing medical achievement—and yet, because of widespread incomprehension, they didn’t land as well as they might have. It might not even be proper to call a technology a technology absent the elements needed to bring it usefully into the human world; if we can’t understand how a technology works, we risk succumbing to magical thinking.
Another way of saying this is that we need cartoons in our heads about how technologies work. I don’t know enough about vaccines to make one for myself, but I have a vaccine cartoon, and it gives me an approximate understanding; it’s good enough to help me follow news about vaccines, and grasp the development process, the risks, and the likely future of the technology. I have similar cartoons in my head about rockets, financial regulation, and nuclear power. They aren’t perfect, but they give me good-enough intuitions. Even experts use cartoons to talk to one another: sometimes a simplified view of things helps them see the forest for the trees.
On this point, I experience some tension with many in my community of computer scientists. I believe that the cartoons we have broadcast about A.I. are counterproductive. We have brought artificial intelligence into the world accompanied by ideas that are unhelpful and befuddling. The worst of it is probably the sense of human obsolescence and doom that many of us convey. I have trouble understanding why some of my colleagues say that what they are doing might lead to human extinction, and yet argue that it is still worth doing. It is hard to comprehend this way of talking without wondering whether A.I. is becoming a new kind of religion.
In addition to the apocalyptic atmosphere, we don’t do a good job of explaining what the stuff is and how it works. Most non-technical people can comprehend a thorny abstraction better once it’s been broken into concrete pieces you can tell stories about, but that can be a hard sell in the computer-science world. We usually prefer to treat A.I. systems as giant impenetrable continuities. Perhaps, to some degree, there’s a resistance to demystifying what we do because we want to approach it mystically. The usual terminology, starting with the phrase “artificial intelligence” itself, is all about the idea that we are making new creatures instead of new tools. This notion is furthered by biological terms like “neurons” and “neural networks,” and by anthropomorphizing ones like “learning” or “training,” which computer scientists use all the time. It’s also a problem that “A.I.” has no fixed definition. It’s always possible to dismiss any specific commentary about A.I. for not addressing some other potential definition of it. The lack of mooring for the term coincides with a metaphysical sensibility according to which the human framework will soon be transcended.
Is there a way to explain A.I. that isn’t in terms suggesting human obsolescence or replacement? If we can talk about our technology in a different way, maybe a better path to bringing it into society will appear. In “There Is No A.I.,” an earlier essay I wrote for this magazine, I discussed reconsidering large-model A.I. as a form of human collaboration instead of as a new creature on the scene. In this piece, I hope to explain how such A.I. works in a way that floats above the often mystifying technical details and instead emphasizes how the technology modifies—and depends on—human input. This isn’t a primer in computer science but a story about cute objects in time and space that serve as metaphors for how we have learned to manipulate information in new ways. I find that most people cannot follow the usual stories about how A.I. works as well as they can follow stories about other technologies. I hope the alternative I present here will be of use.
We can draw our human-centered cartoon about large-model A.I. in four steps. Each step is simple. But they’ll add up to something easy to picture—and to use as a tool for thinking.
I. Trees
The very first step, and in some sense the simplest one, might also be the hardest to explain. We can start with a question: How can you use a computer to find out whether a photograph shows a cat or a dog? The problem is that cats and dogs look broadly similar. Both have eyes and snouts, tails and paws, four legs and fur. It’s easy for a computer to take measurements of an image—to determine whether it’s light or dark, or more blue or red. But those kinds of measurements won’t distinguish a cat from a dog. We can ask the same type of question about other examples. For instance, how can a program analyze whether a passage is likely to have been written by William Shakespeare?
On a technical level, the basic answer is a glommed-together tangle of statistics which we call a neural network. But the first thing to understand about this answer is that we are dealing with a technology of complexity. The neural network, the most basic entry point into A.I., is like a folk technology. When researchers say that an A.I. has “emergent properties”—and we say that a lot—it’s another way of saying that we didn’t know what the network would do until we tried building it. A.I. isn’t the only field that’s like this; medicine and economics are similar. In such fields, we try things, and try again, and find techniques that work better. We don’t start with a master theory and then use it to calculate an ideal outcome. All the same, we can work with complexity, even if we can’t predict it perfectly.
Let’s try thinking, in a fanciful way, about distinguishing a picture of a cat from one of a dog. Digital images are made of pixels, and we need to do something to get beyond just a list of them. One approach is to lay a grid over the picture that measures something a little more than mere color. For example, we could start by measuring the degree to which colors change in each grid square—now we have a number in each square that might represent the prominence of sharp edges in that patch of the image. A single layer of such measurements still won’t distinguish cats from dogs. But we can lay down a second grid over the first, measuring something about the first grid, and then another, and another. We can build a tower of layers, the bottommost measuring patches of the image, and each subsequent layer measuring the layer beneath it. This basic idea has been around for half a century, but only recently have we found the right tweaks to get it to work well. No one really knows whether there might be a better way still.
Here I will make our cartoon almost like an illustration in a children’s book. You can think of a tall structure of these grids as a great tree trunk growing out of the image. (The trunk is probably rectangular instead of round, since most pictures are rectangular.) Inside the tree, each little square on each grid is adorned with a number. Picture yourself climbing the tree and looking inside with an X-ray as you ascend: numbers that you find at the highest reaches depend on numbers lower down.
Alas, what we have so far still won’t be able to tell cats from dogs. But now we can start “training” our tree. (As you know, I dislike the anthropomorphic term “training,” but we’ll let it go.) Imagine that the bottom of our tree is flat, and that you can slide pictures under it. Now take a collection of cat and dog pictures that are clearly and correctly labelled “cat” and “dog,” and slide them, one by one, beneath its lowest layer. Measurements will cascade upward toward the top layer of the tree—the canopy layer, if you like, which might be seen by people in helicopters. At first, the results displayed by the canopy won’t be coherent. But we can dive into the tree—with a magic laser, let’s say—to adjust the numbers in its various layers to get a better result. We can boost the numbers that turn out to be most helpful in distinguishing cats from dogs. The process is not straightforward, since changing a number on one layer might cause a ripple of changes on other layers. Eventually, if we succeed, the numbers on the leaves of the canopy will all be ones when there’s a dog in the photo, and they will all be twos when there’s a cat.
Now, amazingly, we have created a tool—a trained tree—that distinguishes cats from dogs. Computer scientists call the grid elements found at each level “neurons,” in order to suggest a connection with biological brains, but the similarity is limited. While biological neurons are sometimes organized in “layers,” such as in the cortex, they are not always; in fact, there are fewer layers in the cortex than in an artificial neural network. With A.I., however, it’s turned out that adding a lot of layers vastly improves performance, which is why you see the term “deep” so often, as in “deep learning”—it means a lot of layers.