Model
An AI model is a program that has been trained on data to recognize patterns and make predictions, decisions, or generate content. When you ask ChatGPT a question and it answers, or when Netflix recommends a show, or when your phone unlocks with your face, an AI model is doing the work behind the scenes.
The word "model" can sound abstract, but the concept is simple. You give a program a large amount of data, it learns patterns in that data, and then it uses those patterns to do something useful with new data it hasn't seen before. The "learning" part is the training. The "doing something useful" part is inference. And the patterns the model learns are encoded in its parameters (also called weights), which are the numerical values that get adjusted during training.
How do AI models work?
Every AI model follows the same basic cycle: data goes in, patterns are learned, and predictions come out.
Training is the learning phase. You give the model a dataset, and it processes that data to find patterns. For a language model, the training data might be billions of pages of text. For an image recognition model, it might be millions of labeled photos. During training, the model repeatedly adjusts its internal parameters to get better at the task it's being trained for. Training a large model can take weeks or months and cost tens of millions of dollars in compute.
Inference is the application phase. Once a model is trained, it's deployed to handle real-world inputs. Every time you type a prompt into a chatbot, upload a photo for analysis, or trigger a recommendation, the model runs inference: it takes your input, processes it through the patterns it learned, and generates an output. Inference happens in milliseconds for most applications.
Parameters are what the model actually "knows." They're the millions or billions of numerical values that encode the patterns the model learned during training. A model with more parameters can generally represent more complex patterns, which is why parameter count (7 billion, 70 billion, 400 billion) is often used as a rough proxy for model size and capability.
The relationship between data, training, and parameters is what makes an AI model different from traditional software. Traditional software follows explicit rules written by a programmer. An AI model learns its own rules from data.
Different types of AI models
AI models come in many forms, each designed for different tasks and data types. Here are the major categories.
Large language models (LLMs) are trained on massive text datasets to understand and generate human language. They power chatbots, writing assistants, code generation tools, and conversational search. Examples include GPT-4, Claude, Gemini, Llama, and DeepSeek. LLMs are built on the transformer architecture and trained using self-supervised learning (predicting the next token in a sequence), then refined through post-training techniques like RLHF. They're the models most people interact with today.
Image and vision models process visual data. Convolutional neural networks (CNNs) are the classic architecture for image classification, object detection, and medical imaging. Vision transformers (ViTs) apply the transformer architecture to images. Generative image models like DALL-E, Midjourney, and Stable Diffusion create new images from text descriptions.
Speech and audio models convert between speech and text (speech recognition and text-to-speech) or generate audio content. Whisper (by OpenAI) is a widely used speech recognition model. These models power voice assistants, transcription services, and audio generation tools.
Multimodal models process multiple types of data at once. They can take text, images, audio, or video as input and generate outputs across modalities. GPT-4o and Gemini are prominent multimodal models. This is where the field is heading: models that understand context across different data types rather than being limited to one.
Predictive and classification models are the workhorses of traditional machine learning. They include regression models (predicting a number, like a house price), classification models (predicting a category, like spam vs. not spam), and recommendation models (predicting what you'll engage with next). These run on algorithms like random forests, gradient boosting, and logistic regression, and they're still the backbone of many production systems in finance, e-commerce, and healthcare.
Reinforcement learning models learn through trial and error, receiving rewards for good actions and penalties for bad ones. They're used for game playing (AlphaGo), robotics, autonomous vehicles, and resource optimization. In AI development, reinforcement learning is also used during post-training to align language models with human preferences through RLHF.
Foundation models vs. fine-tuned models
An important distinction in modern AI is between foundation models and the fine-tuned versions built on top of them.
Foundation models (sometimes called base models) are large, general-purpose models trained on broad datasets. GPT-4, Claude, and Llama are foundation models. They're designed to be versatile rather than specialized, capable of handling a wide range of tasks out of the box. Building a foundation model from scratch requires enormous resources, which is why only a handful of organizations (OpenAI, Anthropic, Google, Meta) train them.
Fine-tuned models start with a foundation model and adapt it for a specific use case using a smaller, targeted dataset. A company might take an open-source foundation model and fine-tune it on their customer support data, medical records, or legal documents. Fine-tuning is far cheaper and faster than training from scratch, and it produces a model that's more accurate for the specific task while retaining the general capabilities of the base model.
This layered approach is how most production AI systems work. The foundation model provides broad capability. Fine-tuning and prompt engineering narrow it to the specific task.
How AI models are built
Building an AI model follows a structured pipeline, whether you're training a simple classifier or a frontier LLM.
Data collection and preparation comes first. The quality of a model depends heavily on the quality of its training data. For LLMs, this means curating trillions of tokens from books, websites, code repositories, and conversations. For specialized models, it means gathering labeled examples relevant to the task. Data cleaning (removing duplicates, errors, and low-quality content) is often the most time-consuming part of the process.
Model architecture selection determines the structure of the model. For language tasks, transformers are the dominant architecture. For image tasks, CNNs and vision transformers are common. The architecture defines how data flows through the model and how it learns patterns.
Training runs the data through the model, adjusting parameters to minimize errors. For supervised learning, the model compares its predictions to known correct answers. For self-supervised learning (like LLM pre-training), the model generates its own training signal by predicting missing or next tokens.
Evaluation tests the model on data it hasn't seen before. This is where evals come in: structured tests that measure accuracy, safety, helpfulness, and other quality metrics. Evaluation catches problems like overfitting (the model memorized the training data instead of learning generalizable patterns) before the model reaches users.
Post-training and alignment refine the model's behavior using techniques like supervised fine-tuning, RLHF, and DPO. This phase transforms a raw model into one that follows instructions, holds conversations, and avoids harmful outputs.
Deployment puts the model into production, where it serves real users through APIs, applications, or embedded systems. Once deployed, the model runs inference on every request.
Open-source vs. closed-source models
AI models are split between open-source and closed-source approaches, and the distinction affects cost, control, and capability.
Closed-source models (sometimes called proprietary models) are owned and operated by the company that built them. You access them through APIs or subscription products, but you can't see, modify, or host the model yourself. GPT-4, Claude, and Gemini are closed-source. The advantage is that these tend to be the most capable models available. The tradeoff is that you depend on the provider for pricing, uptime, and data handling.
Open-source models (or open-weight models) are publicly available for anyone to download, run, and modify. Meta's Llama, Mistral, and DeepSeek are prominent examples. Open-source models give you full control: you can host them on your own infrastructure, fine-tune them on proprietary data, and customize them without restrictions. The tradeoff is that open-source models are often somewhat less capable than the leading closed-source alternatives, though that gap has narrowed significantly.
Many production AI systems use a mix: closed-source models for the most demanding tasks and open-source models for high-volume, cost-sensitive workloads where control and customization matter more than peak performance.
FAQs
What is an AI model?
An AI model is a program trained on data to recognize patterns and make predictions, decisions, or generate content. It learns from data during training, encodes what it learns in its parameters, and then applies those patterns to new inputs during inference.
What are the different types of AI models?
The main types include large language models (for text), image and vision models, speech models, multimodal models (handling multiple data types), predictive and classification models (traditional ML), and reinforcement learning models. Each type is designed for different tasks and data.
What's the difference between a model and an algorithm?
An algorithm is the set of rules or instructions that defines how learning happens. A model is what you get after running the algorithm on training data. The algorithm is the recipe; the model is the finished dish.
What does "parameters" mean when describing a model?
Parameters (or weights) are the numerical values inside a model that encode what it learned during training. A model with 70 billion parameters has 70 billion adjustable values that together represent its knowledge and capabilities. More parameters generally means more capacity, but also more compute cost.
What's the difference between a foundation model and a fine-tuned model?
A foundation model is a large, general-purpose model trained on broad data (like GPT-4 or Llama). A fine-tuned model takes a foundation model and adapts it for a specific task using targeted data. Most production AI systems use fine-tuned versions of foundation models.
Should I use an open-source or closed-source model?
It depends on your priorities. Closed-source models (GPT-4, Claude) tend to be the most capable but give you less control and come with ongoing API costs. Open-source models (Llama, Mistral) give you full control and lower per-inference cost but may require more engineering effort to deploy and maintain. Many teams use both.
Candidates
Hiring
Content