Your Commerce Team Can't Say No to AI — And It's Costing You Everything

Teams generating 10x more content with AI aren't 10x better. They're 10x more exposed. The skill nobody's teaching is knowing when to kill the output.

The Principal

39 min read

Published 10 March 2026

The 83% Problem

Here's a number that should keep every head of ecommerce awake tonight: frontier AI models now match professionals with an average of 14 years of experience roughly 70% of the time on well-specified knowledge work tasks. They do it over 11 times faster, for less than 1% of the cost. This comes from OpenAI's GDPval benchmark — the most rigorous measurement we have of AI against actual expert work, double-blind, expert-graded, covering 44 occupations across sectors worth $3 trillion annually.

Everyone reads this as either a capability story or a displacement story. Both readings are boring. The interesting question is what happens to the other 30%. And, more dangerously, what happens to the 70% that looks right but isn't quite.

Because in commerce, 'looks right' is where the damage lives.

A product description that's technically accurate but tonally wrong for your customer segment. A promotional email that hits every best-practice checkbox but misses the one cultural nuance that makes your brand distinctive. A pricing analysis that's structurally sound but based on competitive assumptions your category manager would have flagged in thirty seconds. These aren't hallucinations. They're not obvious failures. They're the output that passes a quick scan, gets shipped, and silently erodes the things that actually differentiate your business.

The commerce industry is drowning in this kind of output right now. And almost nobody is measuring it. Research suggests 73% of consumers can now spot AI-generated marketing — and they're actively penalising brands for it. The content flood isn't building trust. It's eroding it.

Generation Is Solved. Verification Is Broken.

Walk into any ecommerce operation in 2026 and you'll find AI generating product copy, campaign briefs, customer service responses, inventory forecasts, SEO content, social media posts, and merchandising recommendations. The generation problem is comprehensively solved. A mid-tier Shopify merchant can produce more content in a week than their team could have written in a quarter two years ago.

But production volume was never the bottleneck. The bottleneck was always quality — specifically, the kind of quality that requires domain knowledge to evaluate. And that bottleneck hasn't moved. If anything, it's got worse, because the volume of output requiring human judgment has increased by an order of magnitude while the number of humans with the domain expertise to evaluate it has stayed roughly constant.

Consider what happens when a DTC brand uses AI to generate 500 product descriptions in an afternoon. Someone has to check those descriptions. Not for grammar — the AI handles that flawlessly. Not for basic accuracy — it's pulling from your product data feed. But for the subtle things: Does this description position the product the way your brand strategy requires? Does it emphasise the right features for each customer segment? Does it avoid claims that would cause regulatory problems in specific markets? Does it match the voice that took your brand team three years to develop?

That verification requires someone who understands the brand, the market, the regulations, and the customer. It requires taste. And taste, unlike content generation, doesn't scale by throwing compute at it.

So what actually happens? In most operations, one of two things. Either the volume gets shipped with minimal review — and quality degrades so gradually nobody notices until a customer survey shows brand perception has shifted. Or a small team of senior people becomes the quality bottleneck, reviewing everything, burning out, and ultimately rubber-stamping work because there simply aren't enough hours.

Neither outcome is acceptable. But both are endemic.

Rejection as a Core Competency

There's a concept that's been gaining traction in AI productivity circles that I think commerce leaders need to hear: your most valuable AI skill is saying no.

Not no to using AI. No to AI output that doesn't meet your bar. And more importantly, no in a way that's specific, documented, and reusable.

Think about what happens when a senior merchandiser looks at an AI-generated product categorisation and says, 'This is wrong.' That rejection contains information. It contains domain knowledge that the AI doesn't have and that no prompt is going to fully capture. The merchandiser knows, from twenty years of experience, that customers in this category browse differently. That placing this product in that taxonomy will reduce discovery. That the cross-sell logic breaks if you categorise it this way.

That knowledge is enormously valuable. And in most organisations, it evaporates the moment the merchandiser fixes the output and moves on. It lives in their head. Maybe it lives in a Slack message. It certainly doesn't get encoded into a system that prevents the same mistake tomorrow.

This is the structural gap. Not in AI capability — in the infrastructure around AI capability. The ability to capture, encode, and compound expert rejection is the missing competency in almost every commerce operation using AI today.

Let me break down what this competency actually involves, because it's not a single skill. It's at least three.

Recognition, Articulation, Encoding

Recognition is the ability to look at AI output and sense that something is off. This is the part you can't shortcut. It comes from years of domain experience — the buyer who's reviewed 2,000 purchase orders and can feel when margins don't add up, the brand manager who's run 50 campaigns and knows when copy is technically competent but strategically empty, the operations lead who's seen three warehouse management system migrations and knows which AI-suggested workflow changes will cause downstream chaos.

This is also the dimension most amplified by AI. A domain expert with strong recognition and access to AI tools can evaluate ten times the output they could before. The amplification is genuinely multiplicative — but only within the boundary of their expertise. Outside that boundary, AI doesn't multiply expertise. It multiplies confidence. And in commerce, misplaced confidence has a direct cost: wrong product assortments, misaligned pricing, campaigns that spend budget without moving the metrics that matter.

Articulation is the ability to explain why something is wrong in a way that produces a usable constraint. 'This isn't right' is a rejection. 'This isn't right because you're treating all product returns as equivalent, but warranty returns and buyer's remorse returns have completely different operational workflows and margin implications' — that's a constraint. It's the difference between taste that stays locked in someone's head and taste that can be shared, taught, and eventually automated.

This is a learnable skill, and almost nobody in commerce is teaching it. We train people on platforms, on analytics tools, on campaign management. We don't train them on the systematic articulation of quality standards. The result is that most commerce teams have senior people who can recognise bad output but can't efficiently transfer that recognition to junior team members or to AI systems.

Encoding is the practice of making constraints persist beyond the moment of rejection. This is where nearly every organisation falls down. A category manager articulates a brilliant constraint about how seasonal products should be merchandised differently from evergreen lines. It lives in an email thread. Next quarter, a different team member makes the same mistake. The constraint gets re-articulated. The cycle repeats.

The compounding waste here is staggering. I've seen ecommerce operations where senior staff spend 15-20% of their time re-correcting the same categories of AI error because no systematic capture mechanism exists. That's not a productivity gain from AI — that's a hidden tax on expertise.

The Institutional Taste Gap

Here's where this gets strategically interesting. If you accept that encoded domain judgment — the accumulated, systematised rejections of experienced practitioners — is a genuine asset, then you have to ask: which commerce organisations are actually building this asset?

The answer, overwhelmingly, is the ones that were already good at institutional knowledge before AI arrived. And they're pulling further ahead.

Think about a company like IKEA. Their product naming conventions alone encode decades of cultural and linguistic judgment. Their catalogue photography style represents thousands of editorial decisions about what 'aspirational but accessible' looks like in different markets. When IKEA uses AI to generate product content, they have a deep institutional library of 'what good looks like' to measure against. A new DTC furniture brand using the same AI models doesn't have that library. They're not evaluating output against decades of encoded taste — they're evaluating it against vibes.

Or consider the difference between a specialist B2B distributor with 30 years of product taxonomy data and a marketplace startup using AI to categorise listings. The distributor's category managers have been rejecting bad categorisations for decades. Those rejections, even if they've never been formally documented, have shaped the taxonomy into something that actually reflects how their customers think and buy. The startup's AI-generated taxonomy might look more 'modern' and 'clean', but it won't encode any of that buying behaviour intelligence. It'll be generic. It'll be commodity output from commodity models.

This is the moat that nobody's talking about. Not the AI models themselves — those are commoditising fast. Not the data, exactly — though proprietary data helps. The moat is the encoded judgment layer: the accumulated, systematised understanding of what quality means in your specific domain, for your specific customers, in your specific market context.

Andrej Karpathy's Software 3.0 framework — that AI systems improve fastest where success can be verified — has a corollary that should concern every commerce executive: the frontier of AI value in your organisation is identical to the frontier of your organisation's ability to verify quality. Where your capacity to say 'this is good' and 'this isn't good enough' extends, AI creates value. Where it doesn't, AI generates risk.

Not theoretical risk. Compounding, silent risk. The risk of an organisation that generates more and more output while understanding less and less about whether that output is actually good.

The Agency Angle Nobody Wants to Hear

If you run an ecommerce agency, this should terrify you. Because the agency model has historically been built on generation capacity — we have the people, the tools, the processes to produce work in volume. AI has commoditised that value proposition almost overnight. Any merchant can now generate campaign copy, product descriptions, social content, and basic analytics at a fraction of agency cost.

The agencies that will survive — and frankly, the ones that deserve to — are the ones that can articulate and encode quality standards their clients can't replicate internally. Not 'we use better tools' (you don't, or you won't for long). Not 'we have more experience' (experience that stays in individual heads walks out the door). But 'we have a systematised, documented, continuously improving understanding of what excellence looks like in your category, and we can apply that understanding at the speed AI demands.'

That's a fundamentally different value proposition. It requires agencies to shift from measuring output volume to measuring rejection quality. How many pieces did we produce? becomes How many pieces did we catch and improve before they shipped? The best agencies have always done this informally — the senior creative director who kills three concepts before one reaches the client. But informal, personality-dependent quality gates don't scale across a 50-person team serving 30 clients.

The agencies that figure out how to encode their senior practitioners' taste into reusable, systematised quality infrastructure will own their categories. The ones that can't will compete on price against AI tools that keep getting cheaper. That's not a competition you win.

I've watched three mid-size ecommerce agencies in the past six months try to differentiate on 'AI-powered efficiency' — essentially promising to do the same work faster and cheaper using AI tools. Two of them have already lost clients who realised they could get the same 'AI-powered efficiency' from a £50/month SaaS subscription. The third pivoted to quality assurance positioning — 'we don't just generate, we verify' — and is winning new business at higher margins. The market is speaking.

Building the Rejection Flywheel

So what does this actually look like in practice? How does a commerce team move from ad hoc rejection to systematic quality encoding?

First, you need to make rejection visible. Most AI-assisted commerce work happens in chat interfaces, in platform-specific tools, in individual workflows. When someone rejects AI output, that rejection disappears into the conversation history. The first step is simply logging what gets rejected and why. Not in a separate tool that nobody will use — in the workflow itself, as a natural side effect of the review process.

Second, you need to categorise your rejections. Once you start logging, patterns emerge fast. You'll find that 40% of your rejections in product copy cluster around brand voice consistency. That 60% of your pricing analysis rejections involve the same three competitive assumptions the AI keeps getting wrong. That your email marketing rejections are overwhelmingly about customer segment sensitivity. These clusters are your constraint library. They're the specific dimensions where AI output consistently fails to meet your bar.

Third, you need to encode constraints into your AI workflows. This is where the compound effect kicks in. Every logged rejection becomes a rule, a check, a guardrail. Your brand voice constraint library becomes a pre-prompt that shapes every piece of product copy before a human ever sees it. Your pricing assumption corrections become validation rules that flag suspicious competitive comparisons. Your segment sensitivity guidelines become part of your email generation pipeline.

The result is a flywheel. Reject → document → encode → prevent. Every turn of the wheel means fewer rejections needed next time, which means your senior people's attention can move to higher-order quality issues, which means the overall quality bar rises, which means your encoded taste becomes a genuine competitive advantage.

This isn't theoretical. Bloomberg built exactly this kind of encoded judgment layer around financial data over decades, and it's the reason a Bloomberg Terminal commands roughly £25,000 a year despite the underlying data being available from dozens of cheaper sources. Epic Systems did it in healthcare — not through superior technology, but through the painstaking process of encoding clinical judgment about what software needs to get right, built rejection by rejection across thousands of hospital implementations. Epic now holds 42.3% of the acute care EHR market, and the switching costs are structural because the encoded quality standards are so deeply embedded in clinical workflows.

Commerce doesn't have its Bloomberg Terminal equivalent yet. But the organisations that start building their encoded judgment layer now — while competitors are still celebrating AI generation volume — will have a significant structural advantage within 18 to 24 months.

The Junior Problem Is the Senior Problem

There's a workforce dimension to this that commerce leaders are largely ignoring. If taste and domain expertise are the critical bottleneck — and they are — then how are you developing those capabilities in your junior staff?

The traditional answer was apprenticeship. Junior merchandisers sat next to senior merchandisers. Junior copywriters got their work shredded by senior editors. Junior buyers watched experienced buyers negotiate. The rejection was direct, personal, and educational. It was also inefficient and unscalable, which is why it's been dying for years.

AI has accelerated this death. Junior roles that used to involve learning-by-doing — writing product descriptions, building basic campaigns, pulling together competitive analyses — are increasingly automated. The junior person who would have developed recognition through 500 manual product categorisations now watches an AI do it in minutes. They've lost the practice ground where taste develops.

But here's the twist: a properly encoded constraint library can partially compensate. If your senior merchandiser's judgment about product categorisation is captured, documented, and available to query, a junior team member can access that institutional taste when reviewing AI output. They can learn what 'good' looks like in your specific context without needing the senior person to sit next to them and explain it for the hundredth time.

This doesn't replace experience. Nothing does. But it does compress the learning curve. And it means that the investment in encoding rejection isn't just about AI quality control — it's about talent development. You're building an asset that serves both purposes simultaneously.

The organisations that crack this will have a significant hiring advantage. They can bring in smart, less experienced people and ramp them faster because the institutional knowledge isn't locked in individual heads — it's accessible, structured, and continuously updated. The organisations that don't will keep competing for a shrinking pool of expensive senior practitioners who hold all the taste and none of the documentation.

Your Anti-Slop Strategy

Let me be direct about what this means operationally.

If you're a head of ecommerce, your competitive moat is not which AI vendor you choose. Models are commoditising. Anthropic, OpenAI, Google — they're all converging on similar capability levels at similar price points. The moat is the depth and durability of your organisation's encoded taste. The constraint library that makes AI output reliable in your domain.

Your job is to audit where domain expert judgment currently lives. Is it in people's heads? In scattered Slack channels? In email threads that nobody will ever search? If so, you're sitting on an asset that's depreciating every time someone leaves, every time institutional memory fades, every time a new team member makes the same mistake their predecessor made six months ago.

If you manage a team, create space for articulation. When someone rejects AI output, don't just let them fix it silently. Ask them to explain why. Make that explanation part of the workflow, not a burden on top of it. A team that can articulate its rejections is building shared understanding that persists across projects, across personnel changes, across platform migrations. A team that silently fixes AI output is treading water.

If you're an individual contributor, your most valuable professional development isn't learning the newest AI tool. Tools will change every quarter. Your value is in deepening your ability to recognise when output isn't working, practising your capacity to articulate what's wrong and how to fix it, and advocating for systems that capture that judgment rather than letting it evaporate.

The commerce industry spent the last two years celebrating AI generation. The next two years will be defined by who gets serious about AI rejection. The teams that learn to say no — systematically, articulately, and with the infrastructure to make every rejection compound — will build organisations that are genuinely enhanced by AI rather than merely accelerated by it.

The difference matters more than you think. Acceleration without quality control is just faster failure. And in commerce, faster failure has a price tag attached to every unit shipped, every campaign launched, and every customer interaction that falls slightly short of what your brand promises.

Learn to say no. Then learn to make your no count.