Generative AI models can carry on conversations, answer questions, write stories, produce source code, and create images and videos of almost any description. Here's how generative AI works, how it's being used, and why it’s more limited than you might think.
Generative AI is a kind of artificial intelligence that creates new content, including text, images, audio, and video, based on patterns it has learned from existing content. Today’s generative AI models have been trained on enormous volumes of data using deep learning, or deep neural networks, and they can carry on conversations, answer questions, write stories, produce source code, and create images and videos of any description, all based on brief text inputs or “prompts.”
Generative AI is called generative because the AI creates something that didn’t previously exist. That’s what makes it different from discriminative AI, which draws distinctions between different kinds of input. To say it differently, discriminative AI tries to answer a question like “Is this image a drawing of a rabbit or a lion?” whereas generative AI responds to prompts like “Draw me a picture of a lion and a rabbit sitting next to each other.”
This article introduces you to generative AI and its uses with popular models like ChatGPT and DALL-E. We’ll also consider the limitations of the technology, including why “too many fingers” has become a dead giveaway for artificially generated art.
The emergence of generative AI
Generative AI has been around for years, arguably since ELIZA, a chatbot that simulates talking to a therapist, was developed at MIT in 1966. But years of work on AI and machine learning have recently come to fruition with the release of new generative AI systems. You’ve almost certainly heard about ChatGPT, a text-based AI chatbot that produces remarkably human-like prose. DALL-E and Stable Diffusion have also drawn attention for their ability to create vibrant and realistic images based on text prompts.
Output from these systems is so uncanny that it has many people asking philosophical questions about the nature of consciousness—and worrying about the economic impact of generative AI on human jobs. But while all of these artificial intelligence creations are undeniably big news, there is arguably less going on beneath the surface than some may assume. We’ll get to some of those big-picture questions in a moment. First, let’s look at what’s going on under the hood.
How does generative AI work?
Generative AI uses machine learning to process a huge amount of visual or textual data, much of which is scraped from the internet, and then determines what things are most likely to appear near other things. Much of the programming work of generative AI goes into creating algorithms that can distinguish the “things” of interest to the AI’s creators—words and sentences in the case of chatbots like ChatGPT, or visual elements for DALL-E. But fundamentally, generative AI creates its output by assessing an enormous corpus of data, then responding to prompts with something that falls within the realm of probability as determined by that corpus.
Autocomplete—when your cell phone or Gmail suggests what the remainder of the word or sentence you’re typing might be—is a low-level form of generative AI. ChatGPT and DALL-E just take the idea to significantly more advanced heights.
What is an AI model?
ChatGPT and DALL-E are interfaces to underlying AI functionality that is known in AI terms as a model. An AI model is a mathematical representation—implemented as an algorithm, or practice—that generates new data that will (hopefully) resemble a set of data you already have on hand. You’ll sometimes see ChatGPT and DALL-E themselves referred to as models; strictly speaking this is incorrect, as ChatGPT is a chatbot that gives users access to several different versions of the underlying GPT model. But in practice, these interfaces are how most people will interact with the models, so don’t be surprised to see the terms used interchangeably.
AI developers assemble a corpus of data of the type that they want their models to generate. This corpus is known as the model’s training set, and the process of developing the model is called training. The GPT models, for instance, were trained on a huge corpus of text scraped from the internet, and the result is that you can feed it natural language queries and it will respond in idiomatic English (or any number of other languages, depending on the input).
AI models treat different characteristics of the data in their training sets as vectors—mathematical structures made up of multiple numbers. Much of the secret sauce underlying these models is their ability to translate real-world information into vectors in a meaningful way, and to determine which vectors are similar to one another in a way that will allow the model to generate output that is similar to, but not identical to, its training set.
There are a number of different types of AI models out there, but keep in mind that the various categories are not necessarily mutually exclusive. Some models can fit into more than one category.
Probably the AI model type receiving the most public attention today is the large language models, or LLMs. LLMs are based on the concept of a transformer, first introduced in “Attention Is All You Need,” a 2017 paper from Google researchers. A transformer derives meaning from long sequences of text to understand how different words or semantic components might be related to one another, then determines how likely they are to occur in proximity to one another. The GPT models are LLMs, and the T stands for transformer. These transformers are run unsupervised on a vast corpus of natural language text in a process called pretraining (that’s the P in GPT), before being fine-tuned by human beings interacting with the model.
Diffusion is commonly used in generative AI models that produce images or video. In the diffusion process, the model adds noise—randomness, basically—to an image, then slowly removes it iteratively, all the while checking against its training set to attempt to match semantically similar images. Diffusion is at the core of AI models that perform text-to-image magic like Stable Diffusion and DALL-E.
A generative adversarial network, or GAN, is based on a type of reinforcement learning, in which two algorithms compete against one another. One generates text or images based on probabilities derived from a big data set. The other—a discriminative AI—assesses whether that output is real or AI-generated. The generative AI repeatedly tries to “trick” the discriminative AI, automatically adapting to favor outcomes that are successful. Once the generative AI consistently “wins” this competition, the discriminative AI gets fine-tuned by humans and the process begins anew.
One of the most important things to keep in mind here is that, while there is human intervention in the training process, most of the learning and adapting happens automatically. Many, many iterations are required to get the models to the point where they produce interesting results, so automation is essential. The process is quite computationally intensive, and much of the recent explosion in AI capabilities has been driven by advances in GPU computing power and techniques for implementing parallel processing on these chips.
Is generative AI sentient?
The mathematics and coding that go into creating and training generative AI models are quite complex, and well beyond the scope of this article. But if you interact with the models that are the end result of this process, the experience can be decidedly uncanny. You can get DALL-E to produce things that look like real works of art. You can have conversations with ChatGPT that feel like a conversation with another human. Have researchers truly created a thinking machine?
Chris Phipps, a former IBM natural language processing lead who worked on Watson AI products, says no. He describes ChatGPT as a “very good prediction machine.”
It’s very good at predicting what humans will find coherent. It’s not always coherent (it mostly is) but that’s not because ChatGPT “understands.” It’s the opposite: humans who consume the output are really good at making any implicit assumption we need in order to make the output make sense.
Phipps, who’s also a comedy performer, draws a comparison to a common improv game called Mind Meld.
Two people each think of a word, then say it aloud simultaneously—you might say “boot” and I say “tree.” We came up with those words completely independently and at first, they had nothing to do with each other. The next two participants take those two words and try to come up with something they have in common and say that aloud at the same time. The game continues until two participants say the same word.
Maybe two people both say “lumberjack.” It seems like magic, but really it’s that we use our human brains to reason about the input (“boot” and “tree”) and find a connection. We do the work of understanding, not the machine. There’s a lot more of that going on with ChatGPT and DALL-E than people are admitting. ChatGPT can write a story, but we humans do a lot of work to make it make sense.
Testing the limits of computer intelligence
Certain prompts that we can give to these AI models will make Phipps’ point fairly evident. For instance, consider the riddle “What weighs more, a pound of lead or a pound of feathers?” The answer, of course, is that they weigh the same (one pound), even though our instinct or common sense might tell us that the feathers are lighter.
ChatGPT will answer this riddle correctly, and you might assume it does so because it is a coldly logical computer that doesn’t have any “common sense” to trip it up. But that’s not what’s going on under the hood. ChatGPT isn’t logically reasoning out the answer; it’s just generating output based on its predictions of what should follow a question about a pound of feathers and a pound of lead. Since its training set includes a bunch of text explaining the riddle, it assembles a version of that correct answer.
However, if you ask ChatGPT whether two pounds of feathers are heavier than a pound of lead, it will confidently tell you they weigh the same amount, because that’s still the most likely output to a prompt about feathers and lead, based on its training set. It can be fun to tell the AI that it’s wrong and watch it flounder in response; I got it to apologize to me for its mistake and then suggest that two pounds of feathers weigh four times as much as a pound of lead.
Why does AI art have too many fingers?
A notable quirk of AI art is that it often represents people with profoundly weird hands. The “weird hands quirk” is becoming a common indicator that the art was artificially generated. This oddity offers more insight into how generative AI does (and doesn’t) work. Start with the corpus that DALL-E and similar visual generative AI tools are pulling from: pictures of people usually provide a good look at their face but their hands are often partially obscured or shown at odd angles, so you can’t see all the fingers at once. Add to that the fact that hands are structurally complex—they’re notoriously difficult for people, even trained artists, to draw. And one thing that DALL-E isn’t doing is assembling an elaborate 3D model of hands based on the various 2D depictions in its training set. That’s not how it works. DALL-E doesn’t even necessarily know that “hands” is a coherent category of thing to be reasoned about. All it can do is try to predict, based on the images it has, what a similar image might look like. Despite huge amounts of training data, those predictions often fall short.
Phipps speculates that one factor is a lack of negative input.
It mostly trains on positive examples, as far as I know. They didn’t give it a picture of a seven fingered hand and tell it “NO! Bad example of a hand. Don’t do this.” So it predicts the space of the possible, not the space of the impossible. Basically, it was never told to not create a seven fingered hand.
There’s also the factor that these models don’t think of the drawings they’re making as a coherent whole; rather, they assemble a series of components that are likely to be in proximity to one another, as shown by the training data. DALL-E may not know that a hand is supposed to have five fingers, but it does know that a finger is likely to be immediately adjacent to another finger. So, sometimes, it just keeps adding fingers. (You can get the same results with teeth.) In fact, even this description of DALL-E’s process is probably anthropomorphizing it too much; as Phipps says, “I doubt it has even the understanding of a finger. More likely, it is predicting pixel color, and finger-colored pixels tend to be next to other finger-colored pixels.”
Potential negative impacts of generative AI
These examples show you one of the major limitations of generative AI: what those in the industry call hallucinations, which is a perhaps misleading term for output that is, by the standards of humans who use it, false or incorrect. All computer systems occasionally produce mistakes, of course, but these errors are particularly problematic because end users are unlikely to spot them easily: If you are asking a production AI chatbot a question, you generally won’t know the answer yourself. You are also more likely to accept an answer delivered in the confident, fully idiomatic prose that ChatGPT and other models like it produce, even if the information is incorrect.
Even if a generative AI could produce output that’s hallucination-free, there are various potential negative impacts:
- Cheap and easy content creation: Hopefully it’s clear by now that ChatGPT and other generative AIs are not real minds capable of creative output or insight. But the truth is that not everything that’s written or drawn needs to be particularly creative. Many research papers at the high school or college undergraduate level only aim to synthesize publicly available data, which makes them a perfect target for generative AI. And the fact that synthetic prose or art can now be produced automatically, at a superhuman scale, may have weird or unforeseen results. Spam artists are already using ChatGPT to write phishing emails, for instance.
- Intellectual property: Who owns an AI-generated image or text? If a copyrighted work forms part of an AI’s training set, is the AI “plagiarizing” that work when it generates synthetic data, even if it doesn’t copy it word for word? These are thorny, untested legal questions.
- Bias: The content produced by generative AI is entirely determined by the underlying data on which it’s trained. Because that data is produced by humans with all their flaws and biases, the generated results can also be flawed and biased, especially if they operate without human guardrails. OpenAI, the company that created ChatGPT, put safeguards in the model before opening it to public use that prevent it from doing things like using racial slurs; however, others have claimed that these sorts of safety measures represent their own kind of bias.
- Power consumption: In addition to heady philosophical questions, generative AI raises some very practical issues: for one thing, training a generative AI model is hugely compute intensive. This can result in big cloud computing bills for companies trying to get into this space, and ultimately raises the question of whether the increased power consumption—and, ultimately, greenhouse gas emissions—is worth the final result. (We also see this question come up regarding cryptocurrencies and blockchain technology.)
Use cases for generative AI
Despite these potential problems, the promise of generative AI is hard to miss. ChatGPT’s ability to extract useful information from huge data sets in response to natural language queries has search giants salivating. Microsoft is testing its own AI chatbot, dubbed “Sydney,” though it’s still in beta and the results have been decidedly mixed.
But Phipps thinks that more specialized types of search are a perfect fit for this technology. “One of my last customers at IBM was a large international shipping company that also had a billion-dollar supply chain consulting side business,” he says.
Their problem was that they couldn’t hire and train entry level supply chain consultants fast enough—they were losing out on business because they couldn’t get simple customer questions answered quickly. We built a chatbot to help entry level consultants search the company’s extensive library of supply chain manuals and presentations that they could turn around to the customer.
If I were to build a solution for that same customer today, just a year after I built the first one, I would 100% use ChatGPT and it would likely be far superior to the one I built. What’s nice about that use case is that there is still an expert human-in-the-loop double-checking the answer. That mitigates a lot of the ethical issues. There is a huge market for those kinds of intelligent search tools meant for experts.
Other potential use cases include:
- Code generation: The idea that generative AI might write computer code for us has been bubbling around for years now. It turns out that large language models like ChatGPT can understand programming languages as well as natural spoken languages, and while generative AI probably isn’t going to replace programmers in the immediate future, it can help increase their productivity.
- Cheap and easy content creation: As much as this one is a concern (listed above), it’s also an opportunity. The same AI that writes spam emails can write legitimate marketing emails, and there’s been an explosion of AI copywriting startups. Generative AI thrives when it comes to highly structured forms of prose that don’t require much creativity, like resumes and cover letters.
- Engineering design: Visual art and natural language have gotten a lot of attention in the generative AI space because they’re easy for ordinary people to grasp. But similar techniques are being used to design everything from microchips to new drugs—and will almost certainly enter the IT architecture design space soon enough.
Conclusion
Generative AI will surely disrupt some industries and will alter—or eliminate—many jobs. Articles like this one will continue to be written by human beings, however, at least for now. CNET recently tried putting generative AI to work writing articles but the effort foundered on a wave of hallucinations. If you’re worried, you may want to get in on the hot new job of tomorrow: AI prompt engineering.