Get Multiple Job Offers from Top Tech Teams

Get Hired

Training/Pre-training

To train an AI model, especially a large language model (LLM), researchers feed it huge datasets made up of text from books, websites, conversations, articles, and more. Sometimes even audio, video, or code is included. Training state-of-the-art models can take weeks or months, involve processing terabytes of data, and cost hundreds of millions of dollars in compute power.

The core technique behind LLM training is called next-token prediction. The model sees billions of text snippets with the final “token” (usually a chunk of a word) hidden, and it learns to guess what comes next. Think of it like a supercharged autocomplete.

Behind the scenes, the model constantly adjusts millions (or even billions) of internal parameters, known as weights. These weights act like connections in a brain, strengthening when predictions are right and adjusting when they’re wrong. Over time, this allows the model to get better at language, reasoning, grammar, and even factual knowledge.

Still skeptical that something as simple as predicting the next word could lead to super-intelligent AI? Here’s a short explanation by Ilya Sutskever, co-founder of OpenAI, on why next-token prediction is surprisingly powerful. For a quick visual overview of training, check out this explainer.

‹ Supervised learning