The AI conversation has a size fixation. Every announcement is about the biggest, most capable frontier model, and the implicit assumption is that bigger is always better. For pushing the limits of what's possible, sure. But for a huge fraction of real-world tasks, that assumption is wrong — and expensive. A smaller, faster, cheaper model is frequently the better engineering choice, because capability you don't actually need is just cost you're paying for nothing.
Here's the case for small models, and why right-sizing beats reflexively reaching for the biggest one.
Smaller AI models are underrated — for many real tasks, bigger isn't better, it's just more expensive and slower.
The case for small:
The biggest model is rarely the right default. Fit the model to the work.
Photo by Fleur on Unsplash
The obsession with the biggest models makes sense for the frontier — when you're tackling the hardest problems, you want maximum capability. But it misleads as a default, because most real tasks aren't at the frontier. Classifying text, extracting fields, simple summarization, routine generation, routing — these are solved perfectly well by smaller models, and throwing a giant frontier model at them is overkill that buys you nothing while costing you plenty.
The misleading assumption is "more capable is always better." It isn't, because capability has a price — in money, in latency, in resources — and capability you don't use is price you pay for nothing. A model twice as capable as your task requires isn't twice as good for that task; it's the same result at higher cost and slower speed. The frontier-model fixation trains people to reach for maximum power reflexively, when the right question is far more practical: what's the smallest model that does this specific job well? Bigger isn't better in general; it's better only when the task actually needs it.
The advantages of smaller models aren't marginal, and they compound dramatically at scale:
| Dimension | Big model | Small model |
|---|---|---|
| Cost per call | High | Much lower |
| Latency | Slower | Faster |
| At high volume | Costs balloon | Savings multiply |
| For a fitting task | Overkill | Right-sized |
For a single call, the difference between a big and small model might seem modest. But real applications don't make single calls — they make thousands or millions, and there the gap explodes. A smaller model that's a fraction of the cost and several times faster turns a use case that's prohibitively expensive on a frontier model into one that's cheap and responsive. Speed matters too: lower latency makes for better user experiences and enables real-time use cases that a slow giant model simply can't serve. At volume, right-sizing isn't a minor optimization — it's often the difference between a viable application and an unviable one. This is the same efficiency-over-excess discipline that good engineering applies everywhere: don't pay for what the job doesn't need.
The core principle is simple: capability you don't need is just cost. A model's extra capability only delivers value if your task actually exercises it. For a task a small model handles well, the frontier model's additional power produces the same output — you've paid more and waited longer for no better result. That's not a better choice; it's a worse one dressed up as the "safe" default.
This reframes model selection as an engineering tradeoff rather than a status decision. The instinct to use the biggest model "to be safe" is usually a mistake — you're optimizing for a capability ceiling the task never reaches while paying real costs in money and speed. The disciplined move is to right-size: pick the smallest, cheapest, fastest model that does the job well enough, and only reach for more capability when the task genuinely demands it. Sometimes it does, and then the big model is correct. But defaulting to maximum power is paying a premium for capability that, for most tasks, sits entirely unused. The best model isn't the most capable one — it's the most appropriate one, which is exactly the kind of judgment that separates hype from reality in AI: match the claim, and the tool, to the actual need.
To choose models well instead of reflexively maxing out:
The throughline: bigger isn't better; fitting is better. The size fixation trains people to overpay for capability they don't use, when the engineering-sound approach is to match the model to the work — small, cheap, and fast where that's enough, big only where the task demands it. Right-sizing is one of the most underrated levers in building with AI, precisely because the whole conversation points the other way.
Q: Isn't a more capable model always the safer choice? No — "safer" usually means "more expensive and slower for no benefit." A model's extra capability only delivers value if your task actually exercises it; for a task a small model handles well, the frontier model produces the same output at higher cost and latency. Defaulting to maximum power optimizes for a capability ceiling the task never reaches while paying real costs. The genuinely sound choice is the smallest model that does the job well, scaling up only when the task demands it.
Q: When should I actually use a big frontier model? When the task genuinely needs frontier capability — the hardest reasoning, the most complex generation, problems at the edge of what's possible. Big models are the right choice precisely there. The mistake is using them as the default for everything, including the many routine tasks (classification, extraction, simple summarization, routing) that smaller models handle perfectly well. Match the model to the task: maximum power where it's needed, right-sized models everywhere else.
Q: How much do smaller models actually save? For a single call the difference can look modest, but real applications make thousands or millions of calls, and there the gap explodes — a smaller model at a fraction of the cost and several times the speed can turn a prohibitively expensive use case into a cheap, responsive one. Lower latency also enables real-time experiences a slow giant model can't serve. At volume, right-sizing is often the difference between a viable application and an unviable one, not a minor optimization.
Small models are underrated because the AI conversation is fixated on size, training people to reach for the biggest model by default. But bigger isn't better — it's just more expensive and slower — for the huge fraction of real tasks that don't need frontier capability. Capability you don't use is cost you pay for nothing.
Smaller models are cheaper and faster, and those advantages compound dramatically at the scale real applications operate, often deciding whether a use case is even viable. So right-size: start from what the task actually requires, default to the smallest model that does the job well, and reserve frontier models for frontier problems. The best model isn't the most capable one — it's the most appropriate one.
I went from 200 to 11,000 subscribers without hiring anyone. AI didn't write my newsletter — it did everything around it.

I chased big, audacious goals for years and burned out every time. Then I built my whole life around wins so small they felt like cheating.

One person, output that looks like five. It isn't about working more hours — it's about a kind of leverage teams rarely have.

Comments
Sign in to join the conversation
No comments yet. Be the first to share your thoughts!