ChatGPT has sparked many people’s imaginations and fears. But what is it, really? It’s an example of a “large language model”—a computer program trained to recognize patterns in everyday writing and replicate them.

How to build a brain

Imagine you go to work for a self-driving car company and they want it to understand stoplights. So you set up a simple version: a camera attached to a computer, pointed at a stoplight. You program one “cell” per pixel in the image with this simple question: If I give you a color, does it mean “stop” (0) or “go” (1)?

Now you change the stoplight thousands of times, checking the answers as you go. When a cell gets it right, you increase the “weight” of that answer, just as you would encourage a learner who gets a correct answer. If it’s wrong, you shift the weight toward the other answer. Over time, the cells will collectively return the correct answer. 

Now build a second smaller group of cells, and connect each one to many cells in the first group. These new cells also return “stop” or “go” by a consensus of the first “layer” of cells. You may need additional layers but eventually you get to one or two numbers between 0 and 1 with the percentage accuracy of the match (e.g., 1% stop, 95% go, 0.125% other).

And that’s the secret. Each cell looks at its little part of the world, learns to respond with an output, and shares that with other cells. Eventually this all reaches a conclusion and you decide whether to move the car.

Once you have trained these cells, you can extract each one’s “weight” and connections to form a “model.” And while models require significant computing power to train, they require very little to use. The code that recognizes faces on your phone is one such model. So is the part that converts images or speech to text. They don’t “understand” anything; they just match patterns.

Teaching it to write

Now, let’s build our own language model. It needs a vocabulary, so take the million most common words from a dictionary, skipping common words such as “a”, “an”, and “the”. Give each word a unique number.

Now build a layer of cells with one cell per word—the input is a 1 for “this is my word” or 0 if not. Add a few more layers, each cross-connected with the previous. At the end, add a layer that indicates which word should come next.

Now feed it an enormous amount of text, one word at a time, and update the weights based on how well it predicts the next word. Once you have trained it, feed in other text and measure the accuracy of its predictions—that's testing the model. You may discover you have to change its design and run the training-testing cycle repeatedly until the model performs as desired.

Training a real-world model is more complicated. But the essence is still a cell making a decision based on its little part of the picture.

Large language models are trained to perform many tasks—summarizing text, generating text, understanding instructions, and even plagiarism detection; each one of these applications is a “module.” Tools such as ChatGPT combine multiple of these modules to understand, summarize, and generate text. With some exportation, you may find ways to use these modules individually or collectively in your own work. 

Train one yourself

There’s a simple demonstration at https://www.tensorflowtictactoe.co/. It has a game of tic tac toe and an untrained model. You play against the computer—which plays terribly at first—and after each game you tell it whether X or O played better. After a half-dozen rounds, it plays a good game.

Implications for instructional design

The quality of a large language model's outputs depends on the quality of its inputs. This is the source of many ethical and legal questions about bias in the results and the use of others’ work. Publishers and  the law are catching up, slowly: the journal Nature won’t accept AI-authored papers, and the US copyright office is considering rejecting a book with AI-generated images.

GPT and its like are fast but literal, constrained to the content supplied during training. It might help you produce content for well-known topics but could “hallucinate” plausible text when asked questions outside that training. You’ll need to rephrase questions and try different approaches just as you would with a subject matter expert, but without the benefit of a SME’s self-awareness.  

We have just begun to explore what this can do. This technology is already being tried as a teaching assistant in computer science with amazing results. A Wharton business professor now requires its use, comparing it to the introduction of the calculator and saying it’s improved the quality and range of ideas. It could be a powerful tool to use in conversation with a SME, enabling you to both guide the results onto useful ground. Specialized language models such as Github Copilot, used for writing programs, can speed along content production. 

Think of this not as a rival but as an aid to your well-trained mind. I’m excited to see what you’ll create with these new tools, and I hope you are, too.