The Smartest Model Nobody Needs (Yet) — And Why Google Built It Anyway

Google shipped the best reasoning engine on earth at a seventh of the price — and doesn't care if you never use it. That tells you everything.

The Principal

38 min read

Published 23 February 2026

Last week, Google shipped Gemini 3.1 Pro. It scores 77.1% on ARC-AGI-2, making it the highest-performing reasoning model on the planet. It costs $2 per million input tokens — roughly a seventh of what Anthropic charges for Opus 4.6. And in its Deep Think configuration, it has solved 18 previously unsolved problems across mathematics, physics, and computer science, including autonomously cracking open questions from the Erdős Conjectures database that human researchers had spent over a decade working on.

The natural response to all of this is to ask: should I switch to Gemini?

The honest answer, for most people reading this, is probably no. And Google knows that. They are, in fact, counting on it. Because Google is not playing the same game as Anthropic or OpenAI. They are playing a fundamentally different one — and once you understand the distinction, it changes how you should think about every AI tool you touch.

The Product Race That Google Opted Out Of

Anthropic needs you. That's not an insult — it's a business model. Anthropic generates revenue primarily from people and companies paying to use Claude. Every enterprise contract, every API call, every Pro subscription matters. Their survival depends on winning the daily-driver race: becoming the AI you open first thing in the morning, the one your team defaults to, the one whose tab stays pinned in your browser.

OpenAI has the same dependency, scaled even larger. ChatGPT's hundreds of millions of users are the foundation of a business that needs to justify a valuation north of $300 billion. Every feature release — voice mode, image generation, Codex, memory — is designed to make the product stickier, to make switching costs higher, to keep you inside their system.

Google does not have this problem. Alphabet generated $73.3 billion in free cash flow last year. Search alone prints money at a rate that makes AI model revenue look like a rounding error. YouTube, Cloud, Android, the advertising machine — these businesses fund the entire AI research programme without breaking a sweat. Google doesn't need Gemini to be your daily driver. They don't need you to choose Gemini over Claude or ChatGPT. They don't even need you to know Gemini exists.

What they need is for intelligence itself to become a solved problem. Everything else follows from there.

Solve Intelligence, Then Solve Everything Else

Demis Hassabis has been saying the same thing for fifteen years, and most people still haven't internalised what it means. The mission statement of DeepMind, from its founding in 2010, has been: "Solve intelligence, then use it to solve everything else." Not "build the best chatbot." Not "win the enterprise market." Not "ship the most features." Solve intelligence.

This is not marketing copy. It is an engineering specification. And it explains every strategic choice Google has made in AI that otherwise looks puzzling from the outside.

Why did Google build its own silicon? Because solving intelligence requires controlling the substrate it runs on. The Ironwood TPU — Google's seventh-generation tensor processing unit — delivers 10x the peak performance of its predecessor, with more than 4x better performance per chip for both training and inference. It can scale to 9,216 chips in a single superpod. This is not a chip you design to sell API calls. This is a chip you design when you believe the bottleneck to artificial general intelligence might be computational, and you want to remove that bottleneck yourself.

Why did Google invest billions in DeepMind's pure research programme? Because solving intelligence means solving it across every domain, not just the ones that generate immediate revenue. AlphaFold predicted the structure of virtually all 200 million known proteins — work that won Hassabis and John Jumper the 2024 Nobel Prize in Chemistry. AlphaGo mastered a game that was supposed to be decades away from falling to machines. Gemini Deep Think is now pulling tools from the Kirszbraun Theorem, measure theory, and the Stone-Weierstrass theorem to solve problems across disciplinary boundaries that human specialists almost never cross.

Why did Google price Gemini 3.1 Pro at $2 per million input tokens — the same price as the previous generation, despite dramatically better performance? Because the point is not to extract maximum margin from each token. The point is to make intelligence abundant and cheap so that it gets used everywhere, by everyone, at every layer of the stack. Google monetises the infrastructure underneath. The intelligence is the lure. The data centres, the TPUs, the Cloud contracts — those are the business.

The Vertical Stack Nobody Else Has

Here is the part that should make Anthropic and OpenAI's investors nervous, even if it doesn't make their products worse.

Google controls the full vertical stack from transistor to end user. Custom silicon (Ironwood TPU) → foundational AI research (DeepMind) → cloud infrastructure (Google Cloud) → distribution (Search, YouTube, Android, Workspace, Chrome). No other AI company has anything close to this. Anthropic trains its models on Google's TPUs. The deal announced in October 2025 gives Anthropic access to up to one million Google TPUs in a multi-year arrangement worth tens of billions of dollars, bringing over a gigawatt of compute capacity online in 2026.

Read that again. Anthropic — Google's most credible competitor in frontier AI — trains its flagship model on Google's hardware, under a contract that represents one of Anthropic's largest cost centres and one of Google's largest Cloud revenue streams. When your competitor's production depends on your infrastructure, you are not in a product race. You are the house. The house doesn't need to win any individual hand.

OpenAI depends on Microsoft's Azure infrastructure in an analogous fashion, but Microsoft doesn't design its own AI chips (it uses NVIDIA). Google's advantage is that the silicon, the research, the infrastructure, and the distribution are all internal. The flywheel is self-contained and self-reinforcing: better chips make better models, better models attract more Cloud customers, more Cloud revenue funds better chips.

Three Models, Three Kinds of Strength

The coverage of Gemini 3.1 Pro has largely followed the usual script: benchmark tables, pricing comparisons, breathless declarations that Google has "won." This misses the important point entirely, because no single model has "won" anything. What has happened is that the frontier has fractured into specialisations, and understanding those specialisations is now worth more than picking the "best" model.

Gemini 3.1 Pro is the strongest naked reasoner. Strip away tools, remove scaffolding, present a raw hard problem — novel mathematics, cross-disciplinary scientific reasoning, problems that require genuine deduction from first principles — and Gemini outperforms everything else. The 77.1% ARC-AGI-2 score measures precisely this: the ability to solve problems the model has never encountered before, with no assistance. This is pure intelligence, unadorned.

Opus 4.6 is the strongest equipped reasoner. Give it tools — a file system, a codebase, a web browser, a command line — and ask it to sustain autonomous work across complex, multi-step projects for hours, and Claude outperforms everything else. This is why Anthropic's enterprise customers, from Rakuten to Amazon, use Claude for agentic coding and organisational workflows where the model needs to maintain context across thousands of files, make judgment calls about escalation, and self-correct over extended sessions. The thinking isn't always the hardest step. The endurance and contextual awareness are.

GPT 5.3 and Codex are the strongest specialist coders. For pure software engineering — SWE-Bench, code generation, debugging, repository-scale refactoring — OpenAI's models hold the edge in throughput and reliability within that specific domain. Not the broadest intelligence. Not the deepest reasoning. The most refined execution within a narrow, enormously valuable vertical.

Three models. Three different kinds of strength. The person who understands this distinction and routes work accordingly is operating at a fundamentally different level than the person who asks "which AI is best?" and uses the answer for everything.

Six Types of Hard Problems (And AI Only Solves Two of Them Well)

The AI discourse has a reasoning fixation. Every benchmark measures reasoning. Every model release highlights reasoning. Every Twitter thread debates whose model reasons better. But here is the uncomfortable truth that the benchmark-obsessed conversation consistently ignores: most knowledge work is not bottlenecked by reasoning.

There are at least six fundamentally different types of difficulty in professional work, and confusing them leads to catastrophically wrong decisions about where to invest in AI.

Reasoning problems are hard because they require deep, multi-step logical deduction from first principles. Proving a mathematical theorem. Designing a novel algorithm. Working through a regulatory compliance analysis where the interactions between fifteen rules create emergent complexity. These are the problems where Gemini 3.1 Pro in Deep Think mode genuinely earns its keep. They are also, for most organisations, perhaps 5-10% of the total workload.

Effort problems are hard because they are enormous, not because any individual step is intellectually demanding. Auditing 10,000 supplier contracts for compliance changes. Migrating a catalogue of 50,000 products between systems. Reviewing three years of customer support transcripts to extract themes. Any competent person could handle any single item. The difficulty is sustained attention across a massive surface area without dropping details. This is where agentic AI — models that can work autonomously for hours — excels. The intelligence per step is modest. The volume and endurance are everything.

Coordination problems are hard because of people, not logic. Getting four departments aligned on a launch timeline. Routing information so that a pricing change negotiated by procurement actually reaches the marketing team before they run a promotion at the old margin. Managing dependencies between teams where nobody owns the gap between handoffs. AI is beginning to help here — Opus 4.6 has shown genuine ability to develop organisational awareness when embedded in engineering workflows — but the fundamental difficulty is human, not computational.

Emotional intelligence problems are hard because relationships are involved. Telling a long-standing client their strategy isn't working. Negotiating with a supplier who is raising prices but whom you cannot afford to lose. Managing a team through a reorganisation where half the people are terrified and the other half see it as a career opportunity. No model handles this meaningfully, and claiming otherwise is either ignorance or sales.

Judgment and willpower problems are hard because the right answer is clear but the execution is painful. Killing a project that has consumed six months of work and emotional investment but isn't going to succeed. Saying no to a lucrative client whose values don't align with yours. Holding a quality standard when the commercial pressure is to ship something mediocre. AI cannot make these decisions for you. It can provide analysis to support the decision, but the difficulty was never analytical.

Ambiguity problems are hard because the question itself is unclear. Revenue dropped 12% last month. Is it the new website design? Audience mix shifting because of a new paid channel? A competitor's sale cannibalising traffic? The economy? All of the above? The hard part isn't computing an answer. It's figuring out what the question actually is. This is the domain of strategic intuition and domain expertise, and no amount of reasoning power helps when the problem hasn't been properly defined.

Now look at that list and ask honestly: which of those six types benefit from a dramatic improvement in raw reasoning capability? One. Perhaps a portion of a second. The rest are bottlenecked by effort, coordination, human dynamics, and the sheer difficulty of operating in conditions where the problem statement itself is ambiguous. Shipping a model that reasons 30% better does not address the challenges that actually slow most organisations down.

Model Routing Is the New Literacy

This is where the practical value lives, and where the gap between sophisticated operators and everyone else is widening by the month.

The model market has differentiated to the point where using one model for everything is like using a single kitchen knife for every cooking task. Technically possible. Functionally stupid. You're overpaying for tasks that don't need frontier intelligence and underperforming on tasks that need a specific kind of frontier intelligence you're not providing.

Intelligent model routing means matching the problem type to the model type, deliberately and consistently.

Complex pricing analysis across multiple markets with interacting regulatory constraints? That's a reasoning problem. Point it at Gemini 3.1 Pro with maximum thinking depth. The cost is modest and the performance at genuine deduction is unmatched.

Migrating and rewriting 5,000 product descriptions across a catalogue? That's an effort problem. Run it through Gemini Flash or a lighter model at a fraction of the cost. The intelligence per description doesn't need to be extraordinary. The consistency and volume do.

Sustained autonomous codebase refactoring across dozens of interconnected files? That's an equipped reasoning problem — endurance and contextual awareness matter more than raw deduction. Use Opus 4.6 or Claude Code, where the model maintains coherent context across thousands of lines over multi-hour sessions.

Rapid competitive research and factual gathering? That's a retrieval problem. Use whatever model has the best web access and the fastest latency. Pay the minimum. Speed matters more than depth.

The person who routes all four of those tasks to the same model is spending 5-10x more than necessary on some tasks and getting measurably worse results on others. The person who routes deliberately is achieving better outcomes at lower cost while building an institutional understanding of which tools fit which problems. That understanding compounds. Every week of practice makes the next week's routing decisions better.

"Which AI should I use?" is the wrong question. It has always been the wrong question. The right question is: "Which AI for which problem?" And the answer changes depending on the type of difficulty you're facing, not the overall capability ranking of the model.

The Uncomfortable Implication

Here is what Google's strategy implies, and what the AI industry would prefer you not to think about too carefully.

Reasoning — the thing that benchmarks measure, that model releases celebrate, that AI Twitter argues about endlessly — is being commoditised. The best reasoning engine in the world now costs $2 per million input tokens. Eighteen months from now, models that reason at today's frontier level will cost a fraction of even that. The price of raw intelligence is in free fall, and Google is accelerating the decline deliberately because they profit from the infrastructure underneath, not from the intelligence itself.

If reasoning is commoditised, then the scarce resource is not intelligence. It's everything that surrounds intelligence: the judgment to know which problems need it, the domain expertise to define problems correctly, the organisational capability to coordinate humans and machines, the emotional intelligence to manage the people whose work is being transformed.

This is genuinely uncomfortable for an industry that has spent the last four years telling you that smarter models are the answer to everything. They are not. Smarter models are the answer to reasoning problems specifically — and reasoning problems are a minority of the difficulties that organisations actually face.

Google understands this at a structural level. Their business does not depend on selling you intelligence. It depends on intelligence existing abundantly, cheaply, and everywhere — so that it drives compute consumption on their infrastructure, search queries through their engine, and data through their cloud. Hassabis told Fortune that he foresees a "new golden era of discovery" and "radical abundance" — a world where intelligence is so plentiful that it transforms medicine, materials science, and energy within 10-15 years. That vision is not altruistic sentiment. It is a business plan. An abundance of intelligence means an abundance of demand for the substrate that intelligence runs on.

What This Means for Anyone Making Decisions

Three concrete implications, not for some abstract future, but for the decisions you're making this quarter.

First: stop chasing the "best" model. The concept of a single best AI is now as meaningful as the concept of a single best vehicle. A lorry is not better than a bicycle. They solve different problems. Gemini 3.1 Pro is not "better" than Opus 4.6 any more than a Formula 1 car is "better" than an ambulance. The question is always: better at what, for which problem, at what cost? Build that matching capability internally. Track which model works for which task type. Document it. Iterate on it. This is the new operational competency, and it compounds with practice.

Second: audit your AI spending by problem type. Classify every task you currently run through AI: is it reasoning, effort, coordination, ambiguity, emotional intelligence, or judgment? Be ruthlessly honest about how much of your budget is going to effort problems that don't need frontier models. Most organisations discover that 70-80% of their AI workload falls into the effort category — large volumes of straightforward work where endurance and consistency matter more than brilliance. You are almost certainly overpaying for that portion.

Third: invest in the skills that AI doesn't commoditise. If reasoning is getting cheaper by the month, the premium shifts to everything reasoning can't do. Problem definition — the ability to look at a confusing situation and formulate the right question. Coordination — the ability to align humans and machines across complex workflows. Domain expertise — the accumulated knowledge that comes from years of operating in a specific field, which no training dataset fully captures. Evaluation — the ability to tell whether AI output is actually good, not just plausible. These skills were always valuable. They are about to become the entire game.

Google built the smartest model nobody needs and priced it at the floor. That's not a failure of product-market fit. That's a statement about what intelligence is worth when the company selling it doesn't depend on the sale. The smartest model in the world is now a commodity input. The scarce thing — the thing worth building, hiring for, and betting your strategy on — is the judgment to know when and how to use it.

Hassabis set out to solve intelligence and then use it to solve everything else. The first half of that mission is further along than most people appreciate. The second half — the "everything else" — is where the actual work begins. And that work is not computational. It's human.