Candidates

Companies

>

Post-training

Post-training

Post-training is the phase of LLM development that happens after pre-training. If pre-training teaches a model to predict text, post-training teaches it to be useful.

A pre-trained model has absorbed patterns from massive amounts of data. It can generate fluent text, complete sentences, and recognize language structure. But on its own, it doesn't know how to follow instructions, hold a conversation, refuse harmful requests, or reason through multi-step problems. Post-training is what transforms a raw text predictor into an assistant that can actually do things.

This is where techniques like supervised fine-tuning (SFT), reinforcement learning from human feedback (RLHF), and direct preference optimization (DPO) come in. Each one refines the model's behavior in different ways, and most production LLMs use a combination.

Pre-training vs. post-training: what's the difference?

Pre-training and post-training are the two major stages of building an LLM. They serve very different purposes.

Pre-training is the foundation. The model processes enormous datasets, such as books, websites, code, conversations, and learns to predict the next token in a sequence. This is computationally intensive, often costing tens or hundreds of millions of dollars and running for weeks or months on thousands of GPUs. The result is a base model that understands language deeply but doesn't know how to behave in a conversation. Ask a base model a question and you might get a continuation of your text rather than an answer.

Post-training is the refinement. Using much smaller, carefully curated datasets and targeted techniques, developers teach the model how to respond helpfully, follow instructions, reason step by step, and avoid harmful or misleading outputs. Post-training is far less expensive than pre-training, days rather than months, thousands of examples rather than trillions of tokens, but its impact on the model's usability is enormous.

The relationship is straightforward: pre-training gives the model knowledge and language ability. Post-training gives it behavior and judgment. Both are essential. A model that skips pre-training has nothing to draw on. A model that skips post-training has knowledge but no idea how to use it.

Why post-training matters

Post-training is where the gap closes between "impressive demo" and "reliable product."

The research community has increasingly shifted its focus toward post-training. Pre-training is well understood and largely a matter of scale. Post-training is where the harder, more nuanced work happens: teaching a model when to say "I don't know," how to reason through a complex problem, how to follow a multi-step instruction, and how to refuse a request without being unhelpful.

Post-training is also what separates different versions of the same base model. When a company releases a "base" and an "instruct" version of a model, the difference is post-training. The base model is the raw output of pre-training. The instruct model has gone through SFT and RLHF (or similar techniques) to make it conversational and useful. The underlying knowledge is the same; the behavior is completely different.

For engineering teams, post-training is increasingly a core skill. Courses from organizations like DeepLearning.AI now treat it as essential curriculum for anyone building with LLMs, covering SFT, RLHF, reward modeling, and reinforcement learning algorithms.

Post-training techniques

Several techniques fall under the post-training umbrella. Most production LLMs use a combination, applied in sequence.

Supervised fine-tuning (SFT) is the most straightforward post-training technique. You give the model a dataset of input-output pairs and train it to produce similar outputs. SFT teaches the model the format and style of a good response. It's effective, predictable, and relatively easy to implement. The limitation is that it requires someone to define what "good" looks like for every example, which doesn't scale well for subjective tasks.

Reinforcement learning from human feedback (RLHF) goes a step further. Instead of showing the model the "right" answer, you show it several possible answers and have humans rank them from best to worst. Those rankings train a reward model, a separate model that learns to predict which responses humans prefer. The LLM is then fine-tuned using reinforcement learning to maximize the reward model's score. RLHF is what makes models like ChatGPT and Claude feel helpful and conversational. It handles subjective quality (tone, helpfulness, safety) better than SFT, but it's more complex and expensive to implement.

Direct preference optimization (DPO) is a more recent alternative to RLHF that achieves similar results with less complexity. Instead of training a separate reward model and running reinforcement learning, DPO trains the LLM directly on human preference data. Given a prompt and two responses (one preferred, one not), DPO adjusts the model to favor the preferred response. It's simpler, more stable, and becoming increasingly popular.

Instruction fine-tuning (IFT) is a specialized form of SFT focused specifically on teaching models to follow instructions. The training data consists of diverse instructions paired with appropriate responses, which teaches the model to generalize across different types of requests rather than memorizing specific tasks.

Constitutional AI (CAI) is an approach where the model is trained to evaluate and revise its own outputs against a set of principles, reducing the need for human feedback at every step. The model learns to self-correct, which helps with safety and alignment at scale.

Post-training vs. fine-tuning: are they the same thing?

This is a common source of confusion. Fine-tuning is one technique within post-training, but they're not the same thing.

Post-training is the broad phase that includes everything done to a model after pre-training to make it useful. It encompasses SFT, RLHF, DPO, safety training, and other alignment techniques. Post-training is typically done by the organization that built the base model before they release it.

Fine-tuning (in the way most practitioners use the term) usually refers to adapting an already-post-trained model for a specific use case. When a company takes GPT-4 and fine-tunes it on their customer support data, that's fine-tuning on top of post-training. The model has already been through SFT and RLHF; the company is adding an additional layer of specialization.

The distinction matters because the goals are different. Post-training makes a model generally useful and safe. Fine-tuning makes a generally useful model specifically good at your task.

What post-training teaches a model

Post-training is responsible for most of the capabilities that make modern LLMs feel useful in practice.

Instruction following. A pre-trained model doesn't inherently know that "summarize this article" means it should produce a summary. Post-training teaches the model to interpret and execute instructions across a wide range of formats and intents.

Conversational ability. Base models produce text continuations, not conversations. Post-training teaches the model to take turns, stay on topic, ask clarifying questions, and produce responses that feel like natural dialogue.

Reasoning. Newer post-training techniques, particularly reinforcement learning approaches, teach models to think step by step, breaking complex problems into smaller pieces, checking intermediate results, and arriving at more reliable conclusions. This is what powers "reasoning models" that show their work.

Safety and refusal. Post-training teaches models to decline harmful requests, avoid generating dangerous content, and flag uncertainty. Without this, a model would happily generate instructions for anything asked, regardless of consequences.

Tool use. Post-training can teach models to call external tools, search the web, run code, query a database, expanding their capabilities beyond what's stored in their parameters.

FAQs

What is post-training in AI?

Post-training is the phase of LLM development that comes after pre-training. It uses techniques like supervised fine-tuning, RLHF, and DPO to transform a raw language model into one that can follow instructions, hold conversations, reason through problems, and avoid harmful outputs.

What's the difference between pre-training and post-training?

Pre-training teaches a model to understand and generate language by predicting text from massive datasets. Post-training refines the model's behavior using smaller, curated datasets and targeted techniques. Pre-training gives the model knowledge; post-training gives it judgment and usefulness.

Is post-training the same as fine-tuning?

Not exactly. Fine-tuning is one technique used during post-training (specifically, supervised fine-tuning). But post-training also includes RLHF, DPO, safety training, and other alignment methods. And when people talk about "fine-tuning" in practice, they often mean adapting an already-post-trained model for a specific task, which is a separate step.

What is RLHF and how does it relate to post-training?

RLHF (reinforcement learning from human feedback) is a post-training technique that uses human preference rankings to train a reward model, which then guides the LLM to produce responses that humans prefer. It's what makes models like ChatGPT and Claude feel conversational and helpful.

Why is post-training getting so much attention?

Pre-training is well understood and mostly a matter of scale. Post-training is where the harder challenges live: teaching models to reason, follow complex instructions, use tools, and stay safe. It's also more accessible — post-training requires far less compute than pre-training, which means more teams can contribute to advancing it.

What's the difference between a "base" model and an "instruct" model?

The difference is post-training. A base model is the raw output of pre-training. It can generate text but doesn't know how to have a conversation. An instruct model has gone through post-training (SFT, RLHF, etc.) to follow instructions and interact usefully.