The essay below is adapted from remarks I recently delivered to an executive gathering of a Swiss media company. The topic at hand was artificial intelligence, whose intellectual roots reach far further back in time than many would assume.
Artificial intelligence—in some form—has been the subject of story and myth for thousands of years. Perhaps the earliest example is of Talos, which in ancient Greek mythology was a bronze humanoid robot built at the request of Zeus to protect the princess Europa and patrol the island of Crete. Since that time, A.I. has frequently appeared in fiction as either a source of wonder or as an existential threat to humanity. But we’re not here today to discuss A.I. in fiction but rather A.I. in fact.
The path leading to the current state of A.I. research also began in antiquity with the early study of logic and reason by Aristotle and his contemporaries. Over the ensuing centuries, philosophers, scientists, and even artists wrestled with the machinery of reasoning and developed the concept of thinking machines as “artificial animals.”
Mathematical formalism helped make the jump from philosophy to science and laid the groundwork for modern computer science. In the 19th century, George Boole pioneered the mathematical study of formal logic, which Gottlob Frege extended into the system of first-order logic still in use today. In such a system, you start with a small set of basic axioms as building blocks and then follow certain rules for manipulating symbols to compute other statements that are true within that system.
But formal logic has its limitations. Kurt Gödel showed that no matter how many new statements you compute by following these rules with a given set of axioms, you’ll always leave out some other true statements. In other words, the system is incomplete. And if you try to patch that hole with additional axioms, you’ll either make the system inconsistent—meaning you can derive a contradiction—or you’ll still miss something and remain incomplete.
This motivated Alan Turing to contemplate exactly what computability is using an imaginary device that we now call a Turing machine. A Turing machine is quite simple, having only a few components, but its design is general enough to implement any possible computer program, no matter how complex.
But Turing showed there are some functions that are uncomputable by any Turing machine, which means that there is a whole world of truth out there that cannot be touched by any algorithm that we can create. In other words, computation is also “incomplete.” And since we still don’t know whether the brain is a Turing machine, we don’t know whether our intelligence and experience can be reduced to a program or if we instead exist in a realm outside of computation.
But that hasn’t stopped A.I. researchers from trying to approximate the behavior of brains and beings computationally. Early efforts starting in the mid-20th century were largely based on symbolic logic, if-then statements, and long lists of formal procedures that researchers had to craft by hand. And this symbolic approach had enough success for researchers to become overconfident and believe that general artificial intelligence was very nearly a solved problem. In 1965, for instance, Herbert Simon, one of the early pioneers of A.I., said that within 20 years, machines would be capable of doing any work a man can do. Well, 1985 came and went, and that statement certainly wasn’t true.
In fact, between 1974 and 1980, not long after Simon’s statement, the hype bubble burst, and progress in A.I. slowed down so much that people began losing interest in A.I. research altogether, and those who did maintain an interest had a hard time finding funding. This period was known as an “A.I. winter,” and it was the first of two major slowdowns in A.I. research in the late 20th century, the second coming between 1987 and 1993.
We are currently in an “A.I. spring,” a boomtime of rapid development in the field, whose achievements have spurred interest throughout academia, industry, and the general public. The current boom was made possible by the advent of deep learning, which refers to training networks with many layers of artificial neurons. The theoretical foundations of deep learning—which were originally inspired by research into the structure and function of biological neurons—had been laid for years. But algorithmic advancements and huge increases in computer power finally made it a practical technique, which started to take off around 2012 and became the dominant machine-learning method not long after.
The idea behind machine learning—deep learning included—is that you can train a system to do something—classify images for what is depicted in them, drive a car without human intervention, or even write an essay (but not this one)—without having to actually program it to do so. The assumption is that the secret recipe for performing these tasks is hidden in the data from recordings of other times those tasks were performed. And if you have enough data, your machine-learning algorithm can recognize and exploit the patterns it contains and effectively create its own program for performing any task you give it. And it does this in almost every case by mathematically optimizing some measure of performance based on what the machine produces in a given scenario versus the examples of similar scenarios that exist in the data.
And the remarkable thing is that it seems to work in a lot of cases: Using this training philosophy, machines now match or outperform humans at a number of tasks, including image classification and certain types of gameplay. But there are plenty of things that machines can’t do as well as humans, at least not yet. Driving is one of them, although having lived in both Miami and Los Angeles, I am almost ready to call it a draw.
Throughout the history of A.I., before it was possible for machines to perform a task that humans could do, it was common to think that true intelligence was required to perform that task. And then, once a machine was able to perform that task better than any human—such as when IBM’s supercomputer Deep Blue beat grandmaster Garry Kasparov in chess—it was just as common to make an exception and say, “OK, well obviously not that task.” This moving of the goalposts defining the threshold of intelligence to just past machines’ current capabilities led to the following half-serious definition by computer scientist Larry Tesler: “A.I. is whatever hasn’t been done yet.”
One human capability that seemed safe from machines almost indefinitely was creativity. Of course machines can’t carry on a conversation, write poetry or music, paint pictures, or make scientific discoveries. Right? But obviously we are here today discussing this topic because generative A.I. has made major inroads in these directions.
By generative A.I.—or genAI—I mean artificial systems that can create novel media—text, audio, images, video—appearing to be genuine or human-produced, especially in response to a prompt. There are two prominent examples of genAI currently in the public’s attention.
The first is large language models such as GPT—which stands for generative pre-trained transformer. GPT can produce remarkably coherent text in response to the simplest nudge from a human user. This model is trained on a huge data set—billions of lines of text—with the sole task of predicting what will come next given what it has seen so far. This can take the form of predicting the next letter, the next word, or in the case of GPT a token that corresponds to about four letters. This is a task that is easy to describe but hard to perform well. Our mobile phones try to do it by suggesting the next word when we are texting someone. Google’s Gmail also does it when we are writing emails, going so far as to try to complete our sentences. GPT does essentially the same thing, but it is a much more sophisticated model trained on much more data.
The truly surprising thing about large language models is their emergent capabilities. This means that once they are exposed to huge amounts of data, they are able to perform tasks that they weren’t specifically trained to do. This emergence of such capabilities is not very well understood, but it is a signature phenomenon of a number of complex systems, including biological brains.
The second example of genAI that has captured the public’s attention is image generation, most notably with a process called stable diffusion. Diffusion models are deep neural networks that learn to undo a process of adding noise to data. That is, we take clean data that we would like to replicate, and we progressively add noise to it. This is the diffusion process, much like when a spray of perfume diffuses into the air through random interactions or a drop of ink diffuses through a tank of water. After a certain number of steps, the data is completely destroyed; it’s been spread around so that it looks just like random noise. That’s the easy part, destroying data.
The hard part, which is given to the neural network, is to learn the reverse process of making noisy data cleaner. This is a really hard task—technically it’s an impossible task—but since we have all that data from the forward process, we have tons of examples of what a denoising solution looks like if we just play the forward process in reverse, and the model is able to learn to hallucinate a solution that works, specially tuned to the data it was trained on and helped along by the prompt it was given as side information. The amazing part comes when we show the model true random noise: It still tries to clean it up, and in the process it hallucinates patterns and detail where there weren’t any before, and a clean, never-before-seen image emerges at the end.
The thing that distinguishes the generative models of today from those of even the very recent past is quality. It is currently the case that it is nearly impossible to distinguish GPT-generated text from text written by a human. In fact, making this distinction is the basis of something Alan Turing called the imitation game, which is now more commonly referred to as the Turing test. If you read a chat log between a machine and a human, can you reliably tell which is which? Turing proposed that if you can’t tell the difference between the machine and the human, then the machine is exhibiting intelligent behavior. And even though some of what GPT says is bizarre—such as when it tried to break up a New York Times reporter’s marriage over the course of a two-hour chat—so much of what it says is plausibly human that GPT’s passing the Turing test, either now or in the very near future, seems guaranteed.
Generated imagery has also become truly convincing. In fact, I was recently taken in by one example: Back in March, photos of Pope Francis wearing a puffy white jacket made the rounds online. If you don’t know what I’m talking about, I won’t be offended if you take a moment to Google “pope in puffy jacket.” I remember seeing these pictures and thinking, “OK, so the pope’s changing up his fashion a bit.” What I didn’t think was, “Hey, these photos are fake!” But they were. I was fooled, because I was prepared to accept that Pope Francis might actually own such a jacket. My skeptical defenses were down.
The plausibility of fake media has already put us in a bind, and it will only get worse as bad actors fully leverage this rapidly evolving technology. Even before genAI took off, news organizations were having difficulty counteracting misinformation and disinformation, with some people refusing to believe that thoroughly debunked stories were false. Now we are entering an era in which fake news can be accompanied by convincing audio, images, or video. And as genAI continues to develop, it will be easier to produce fake media on almost any reasonably equipped computer anywhere in the world. Media companies will need to devote additional effort and resources to counteract what could be a coming flood of bad information.
This will be painstaking work. For example, the New York Times recently reported on African asylum seekers who were rounded up in Greece and abandoned on an inflatable raft in the sea, in violation of both Greek and international law. Greece denied doing it, but someone had secretly shot video catching them in the act. But was the video real? Quoting from the story, “The Times verified the footage by doing a frame-by-frame analysis to identify the people in the video, geolocating key events and confirming the time and day using maritime traffic data, as well as an analysis of the position of the sun and visible shadows.” This was a major undertaking for a major story, but it reflects a standard of skepticism that we must maintain going forward when seeing is no longer necessarily believing.
It’s difficult to predict what the A.I. of even the very near future might look like. It is almost the definition of emergence that the new properties that arise in systems of increasing complexity can’t be predicted from what came before. And indeed, some people are starting to get concerned. An open letter with over 27 thousand signatures so far, including those of some of the tech pioneers who made the current state of A.I. possible, calls on all A.I. labs “to immediately pause for at least six months the training of A.I. systems more powerful than GPT-4.” And in a surprise move, Geoff Hinton, often called the “godfather of A.I.,” resigned his position at Google and expressed both worry and regret at the rapid progress A.I. has made.
Even so, the Terminator– and Matrix-style nightmare of A.I. taking over the world is not a foregone conclusion. This nightmare only happens when you put A.I. in decision-making positions previously occupied by humans, whether it’s at the trigger of an armed drone flying over a remote land, behind the controls of a nuclear missile silo … or possibly even at the resume-screening stage at a company or at the admissions board of a university.
The real risk comes not just from human intelligence using artificial intelligence as a weapon but also from humans being too eager to give A.I. responsibilities it’s not worthy of. A.I. is simply not ready to take the wheel—not in our cars, and not in our lives. But it is a powerful tool, with potentially revolutionary applications in medicine, education, entertainment, and beyond if it can be used responsibly. The lingering question is: Will it be?