OpenAI Doesn't Want to Be Your Chatbot. It Wants to Be Your Shop Assistant.
While everyone obsesses over GPT-6 benchmarks, OpenAI is quietly building earbuds, speakers and glasses. Commerce teams should be terrified — and excited.
While everyone obsesses over GPT-6 benchmarks, OpenAI is quietly building earbuds, speakers and glasses. Commerce teams should be terrified — and excited.

Last week, OpenAI employees started publicly teasing a new Omni model — a successor to GPT-4o that would natively process text, images, video, and audio through a single unified architecture. The AI press dutifully lost its mind. GPT-6 timelines were sketched. Benchmark speculation ensued. The usual circus.
But buried beneath the model hype was something far more consequential: OpenAI now has over 200 people building physical hardware. Earbuds. Smart speakers with cameras. Glasses. A mystery device that may or may not be a lamp. They've hired Jony Ive — the man who designed the iPhone — and they're talking to Foxconn about manufacturing runs of 40 to 50 million units in year one.
If you work in commerce, that number should stop you cold. Fifty million AI-powered devices, each one capable of seeing products, hearing conversations about purchases, and — critically — acting on them. Not sometime in the distant future. Manufacturing conversations are happening now, with the same company that builds iPhones.
Everyone is asking what GPT-6 will score on reasoning benchmarks. The better question is what happens to retail when a billion people have an AI shopping assistant literally in their ear.
To understand why this matters, you need to understand what 'Omni' actually means in this context, because it's not just a marketing label.
GPT-4o was supposed to be OpenAI's multimodal moment back in 2024. One model, processing everything — text, images, voice — natively and simultaneously. The demo was extraordinary. The reality was underwhelming. The voice sounded flat. Image processing was handled by separate systems stitched together behind the scenes. It was a multimodal Potemkin village.
The new Omni model, confirmed by OpenAI researchers on social media, is the genuine article. One neural network processing everything at once — your voice, the image you're showing it, the video feed from a camera, the text on your screen — all flowing through a single architecture simultaneously.
Now layer that capability onto commerce scenarios. A customer wearing OpenAI earbuds walks through a supermarket. The AI can hear them discussing meal plans with their partner, see the products on the shelves through connected glasses, cross-reference dietary preferences from months of conversation history, and suggest alternatives — all in real time, all through one system, without the latency and errors that come from bolting separate models together.
That's not science fiction. That's an engineering roadmap with a manufacturing partner attached to it.
The current generation of commerce AI — your chatbots, your recommendation engines, your personalisation layers — are text-in, text-out systems pretending to understand context. The Omni model doesn't pretend. It processes the full sensory context of a purchasing moment. The difference isn't incremental. It's categorical.
'Conversational commerce' has been a buzzword for the better part of a decade, and it's been a lie for roughly the same duration. Here's why.
Every voice AI system currently deployed in commerce — every customer support bot, every voice assistant on a retail app, every IVR replacement — works on a turn-based model. You speak. You stop. The AI processes. The AI responds. You speak again. It's a walkie-talkie pretending to be a telephone.
Real human conversation doesn't work like that. When you're in a shop talking to a knowledgeable assistant, you interrupt. You say 'actually, no, the blue one.' You mutter 'hmm' while thinking. The assistant reads your hesitation and offers a different option before you've even articulated the objection. The entire interaction is bidirectional, overlapping, and continuous.
OpenAI is building exactly this capability. They call it BiDi — bidirectional audio that allows the AI to listen and speak simultaneously, processing your voice continuously rather than waiting for you to finish. It can detect hesitation, respond to interruptions, and adapt its communication style in real time.
The prototype reportedly works for several minutes before degrading, and they've pushed the ship date from Q1 to Q2 2026 or later. But the intent is clear: they want AI conversation to be indistinguishable from human conversation.
For commerce, this changes everything about voice-driven purchasing. Current voice commerce through Alexa or Google Assistant is painful precisely because the turn-based interaction model creates friction at every step. 'Did you mean the 500ml or the 1 litre?' Pause. 'The 1 litre.' Pause. 'Would you like to add that to your basket?' Pause. Every pause is a moment where the customer questions whether this is actually easier than pulling out their phone.
BiDi eliminates that friction. The AI can confirm your choice, suggest a complementary product, and process the order in a flowing conversation that feels like talking to a competent human. The conversion implications are enormous. Voice commerce adoption has been growing but still nascent for years — not because people don't want to buy by voice, but because the experience has been atrocious. Fix the experience, and you unlock a channel that serves the 4.2 billion smartphone users who find typing a search query more effort than just saying what they want.
Here's where it gets genuinely uncomfortable for anyone building a commerce strategy around existing channels.
OpenAI is constructing what amounts to an ambient commerce infrastructure — a trinity of hardware devices, each serving a different touchpoint in the customer's day:
The Earbuds (Codename: Sweetpea) — Open-style earbuds with a custom 2nm processor for on-device AI processing. These sit in your ear all day, hearing the world around you whilst providing an AI assistant. First-year sales target: 40 to 50 million units. Manufacturing partner: Foxconn. The commerce implications of 50 million people with a purchase-capable AI whispering in their ear shouldn't need spelling out, but I'll spell it out anyway: every overheard product mention, every 'I need to buy more milk', every 'that jacket looks nice' becomes an actionable commerce moment.
The Smart Speaker — A speaker with a built-in camera, expected at $200-$300, with Face ID-style authentication for purchase authorisation. GoAirTech is reportedly supplying speaker modules. Expected February 2027. This isn't an Echo competitor. It's a visual commerce terminal in your kitchen. It can see what's in your fridge, identify when you're running low on something, and complete the purchase with a glance for authentication. No app. No screen. No friction.
The Smart Glasses — Mass production not expected until 2028, but the vision is clear: AI that sees what you see, in real time, all day. For retail, this means product recognition, price comparison, and instant purchasing from any physical environment. Walking past a shop window becomes a shopping experience.
Individually, each device is interesting. Together, they're an ambient commerce layer that wraps around the customer's entire day. Morning: the speaker notices you're out of coffee and orders it. Commute: the earbuds hear your podcast mention a book and offer to buy it. Afternoon: the glasses recognise a colleague's trainers and find the best price. Evening: the speaker spots your near-empty wine rack and suggests a case from your preferred merchant.
This is not speculative. These are real products with real manufacturing partners, real hiring numbers (200+ people), and real timelines. OpenAI has nearly a billion weekly users on ChatGPT. They're not building hardware for a niche. They're building it for the mass market.
The obvious objection writes itself: didn't we try this already? The Humane AI Pin launched with enormous hype and collapsed spectacularly. The Rabbit R1 was an expensive toy that couldn't do much of anything. AI hardware has been a graveyard of ambition.
But the failure of Humane and Rabbit actually strengthens the case for OpenAI's hardware play, for three reasons.
First, distribution. Humane had to convince people to try an entirely new platform from an unknown company. OpenAI has a billion weekly ChatGPT users who already know, trust, and rely on the AI. Selling earbuds to existing users is a fundamentally different proposition from launching a new category with zero installed base. Apple didn't invent the smartphone — they launched one after 100 million people were already using iPods. OpenAI is running the same playbook.
Second, capability. Humane's AI wasn't good enough. The Pin's voice assistant was slow, inaccurate, and frustrating. OpenAI's current GPT-5.4 is already capable of genuine computer use — navigating interfaces, completing tasks, understanding complex instructions. GPT-6 will extend this further. The AI inside OpenAI's hardware won't be a gimmick bolted onto a gadget. It'll be the most capable AI system on the planet, purpose-built for the device it runs on.
Third, design. Say what you will about Jony Ive — and there's plenty to say — his track record is unmatched. The iPhone, the iPad, the MacBook Air, the iMac. He has an almost supernatural ability to make complex technology feel simple and desirable. Humane's Pin looked like a clip-on microphone. Whatever Ive designs will look like something people actually want to wear.
The lesson from Humane isn't that AI hardware can't work. It's that AI hardware requires three things simultaneously: a massive existing user base, genuinely capable AI, and exceptional industrial design. OpenAI is the first company to have all three.
If you're running ecommerce operations, managing a retail brand, or building commerce technology, here's the uncomfortable reality: the channel strategy you're building today may be obsolete within 24 months.
Current commerce infrastructure assumes screen-based interaction. Your website, your app, your email campaigns, your social commerce — all of it assumes a customer looking at a screen, making deliberate navigation choices, and completing a checkout flow. Ambient AI commerce assumes none of those things. The customer might be speaking, glancing, or simply existing in proximity to a device. The purchase funnel doesn't just shorten. It dissolves.
Here's what smart commerce teams should be doing now, before the hardware ships:
Build for voice-first product discovery. If a customer asks an AI 'what running shoes should I buy for flat feet?', will your products surface? The answer depends not on your SEO or your paid ads, but on whether your product data is structured, detailed, and conversational enough for an AI to recommend. Most product catalogues were built for keyword search, not natural language recommendation. That gap needs closing now.
Invest in conversational product data. AI assistants don't read your product pages the way humans do. They need structured attributes, comparison data, use-case context, and honest pros-and-cons information. The brands that feed AI assistants the best data will get the recommendations. This is the new SEO — except instead of optimising for Google's algorithm, you're optimising for an AI's recommendation logic.
Rethink authentication and checkout. If a smart speaker can authorise purchases via Face ID, your checkout flow is irrelevant. The winning merchants will be those whose purchasing APIs are simple enough for an AI agent to call, complete a transaction, and confirm delivery — all without the customer ever seeing a screen. Shopify's checkout APIs already support this pattern. If you're on a platform that doesn't, that's a problem you'll want to solve before 50 million earbuds ship.
Prepare for zero-UI commerce. The most radical implication of ambient AI is that the user interface disappears entirely. No website. No app. No cart. The customer says 'order more of that coffee I liked last month' and the AI handles everything — product identification, price comparison, payment, delivery scheduling. If your competitive advantage depends on your website's design or your app's UX, you're building on sand. The advantage shifts to product quality, data richness, fulfilment speed, and merchant API accessibility.
Watch the agent commerce layer. OpenAI has already confirmed that GPT-6 will have autonomous agentic capabilities — the ability to take actions, not just recommend them. Combined with ambient hardware, this creates a world where AI agents are genuine economic actors, making purchasing decisions on behalf of humans. The merchants who build relationships with these agents — through data quality, reliable fulfilment, and fair pricing — will capture disproportionate value. It's B2B sales, except the 'B' is an AI.
There's one more detail from the GPT-6 leaks worth dwelling on. OpenAI has partnered with AMD to deploy six gigawatts of computing power for model training. The first gigawatt comes online in the second half of 2026, aligning with GPT-6 developer preview timelines.
Six gigawatts. To contextualise that: it's roughly the output of six nuclear power stations. It's more computing power dedicated to a single AI training run than most countries use for their entire digital infrastructure. And it exists for one purpose — to make the AI inside those earbuds, speakers, and glasses smart enough to genuinely replace human assistants in daily life.
The scale of investment here should tell commerce leaders something important about the conviction behind this strategy. OpenAI isn't experimenting with hardware as a side project. They're building infrastructure that only makes financial sense if ambient AI devices become as ubiquitous as smartphones. Their bet is that within three to five years, the primary way most people interact with AI won't be by typing into a chat window, but by speaking to a device that's always listening, always watching, and always ready to act.
For commerce, that means the market isn't shifting gradually. It's being rebuilt from the ground up by a company with the capital, the talent, the user base, and now the hardware to actually pull it off.
The credible timeline looks something like this: GPT-6 developer preview in late 2026. Earbuds shipping in early 2027. Smart speaker following weeks later. By the end of 2027, there could be 50 to 100 million ambient AI devices in homes and ears worldwide, each one a commerce terminal that never sleeps.
Most ecommerce strategies are built around optimising the next quarter's conversion rate on a website. The companies that will win the next decade are the ones building for a world where the website is the least important part of their commerce stack.
Here's what nobody in the commerce industry wants to hear: the smartphone era of ecommerce was a 15-year warmup act. The real disruption — the one that fundamentally changes how humans discover, evaluate, and purchase products — is ambient AI. And it's arriving not as a slow evolution of existing tools, but as a coordinated hardware-software offensive from the most well-funded AI company on earth.
OpenAI isn't building a better chatbot. They're building what they internally describe as an 'ambient AI ecosystem' — brain (Omni model), voice (BiDi), and body (hardware). When those three layers converge, and the timeline suggests 2027 for meaningful consumer availability, the commerce terrain changes as fundamentally as it did when Apple launched the App Store in 2008.
The merchants who treated mobile as 'just a smaller screen' spent a decade playing catch-up. The ones who recognised it as a new paradigm — always-on, location-aware, camera-equipped — built billion-pound businesses.
Ambient AI commerce will sort winners and losers even more brutally, because there's no screen to optimise. No app store to be listed in. No search results to rank for. There's just an AI, making recommendations and completing purchases, based on data quality, merchant reliability, and API accessibility.
If your commerce strategy doesn't have an answer for 'what happens when the customer never visits our website?', you're not behind. You're invisible.
The hardware is coming. The model is coming. The voice is coming. The only question is whether your commerce operation will be ready when 50 million earbuds start whispering product recommendations to the world.