Mira Murati's Lab Just Announced AI That Listens While It Talks. The Technical Gap It Fills Is Real

Every AI model you have ever used works the same fundamental way. You type something, or you speak. The model waits until you are finished. Then it processes your complete input. Then it generates a response. Then you listen. Then you respond again. The entire architecture of human‑AI interaction, since the first chatbot to the most sophisticated voice interface available today, is built on this turn‑based structure: one party acts while the other is completely passive.
This is not how human conversation works. When two people talk, they are both continuously processing information simultaneously. A listener tracks the speaker's words in real time, forming hypotheses about where the sentence is going, preparing potential responses, and calibrating their understanding moment by moment. They can interrupt. They can signal comprehension with backchannels. They can course‑correct a misunderstanding mid‑sentence rather than waiting for the speaker to finish. The conversation is a continuous, bidirectional stream, not an alternating sequence of complete turns.
On May 11, 2026, Thinking Machines Lab, the AI startup founded by Mira Murati, the former Chief Technology Officer of OpenAI, announced something it is calling interaction models. The company calls the technical term for this "full duplex," and claims its model, TML‑Interaction‑Small, responds in 0.40 seconds, which is roughly the speed of natural human conversation and significantly faster than comparable models from OpenAI and Google.
The announcement does not mean a product is shipping today. This is a research preview, not a product. The company isn't releasing it to the public yet. A "limited research preview" is coming in the next few months, with a wider release set for later this year.
But the underlying technical claim is real, the architecture is novel, and understanding what Thinking Machines has actually built requires understanding why the turn‑based structure is a genuine limitation rather than an acceptable compromise.
The Turn‑Based Problem and Why It Matters for AI
The basic interaction with current AI is a spotty one: the user provides an input, then waits anywhere from a few milliseconds to several minutes, depending on the model used, before finally receiving the output. This occurs because existing models need to wait for their users to finish asking a question or complete the sentence they're saying before they can start processing a response.
For text‑based interaction, this delay is acceptable. Reading and writing naturally occur in alternating turns. The turn‑based structure of AI text chat maps reasonably well onto how written communication already works.
For voice‑based interaction, the limitation is more disruptive. A voice AI that cannot process your speech until you have completely stopped speaking cannot interrupt you to ask a clarifying question. It cannot signal that it has already understood your point. It cannot respond to a partial query that you rephrase midway through. The conversation feels mechanical precisely because it is: a structured alternation of complete monologues rather than the fluid, overlapping, bidirectional exchange that human conversation actually is.
The technical term for what Thinking Machines is building is full‑duplex interaction. A full‑duplex system processes both input and output streams continuously and simultaneously, the way a phone call works at the hardware level, as opposed to a walkie‑talkie, where one party transmits and the other is muted. To get around the turn‑based limitation, Thinking Machines has created an entirely new model architecture that enables full‑duplex interaction.
What TML‑Interaction‑Small Actually Does
The model Thinking Machines announced is called TML‑Interaction‑Small. Its headline specification is a 0.40‑second response latency, described as roughly the speed of natural human conversation and significantly faster than comparable voice interaction models from OpenAI and Google.
The latency specification is commercially important for a specific set of applications where AI voice interaction needs to feel natural rather than robotic. Customer service agents, meeting assistants, real‑time translation tools, and accessibility applications all depend on response speed that feels conversational rather than computational. The 0.40‑second figure positions TML‑Interaction‑Small as a voice AI that can operate within the response time window that humans perceive as natural dialogue rather than slow querying.
The broader context of the interaction models announcement reflects Thinking Machines' founding thesis. Mira Murati described the company's mission at founding as building "multimodal systems that work with people collaboratively," a framing that anticipated precisely the product direction the interaction models announcement represents. A model that can process multiple modalities simultaneously, listening and generating at the same time, is a different category of AI assistant from one that alternates between passive reception and active generation.
Where Thinking Machines Stands
Understanding this announcement in context requires the full picture of what Thinking Machines has built and navigated in its first year.
The company was founded in February 2025 and raised $2 billion in a seed round led by Andreessen Horowitz in July 2025, valuing it at $12 billion, the largest seed round in Silicon Valley history. Investors alongside a16z included Nvidia, Accel, ServiceNow, Cisco, AMD, and Jane Street. The Albanian government invested $10 million, requiring an amendment to the country's 2025 budget.
The founding team brought together Murati with several of her former OpenAI colleagues: Barret Zoph, former VP of Research; Lilian Weng, former VP; and John Schulman, OpenAI's co‑founder who joined after a stint at Anthropic. In January 2026, Barret Zoph and Luke Metz departed to return to OpenAI, an episode that generated significant industry commentary about talent retention at frontier AI labs.
In October 2025, Thinking Machines launched Tinker, its first product: an API for fine‑tuning AI models that allows researchers and developers to customize model behavior for specific tasks without the cost and complexity of full‑scale training runs. In March 2026, Nvidia announced a "significant investment" in Thinking Machines as part of a multi‑year strategic partnership, including a commitment to deploy at least one gigawatt of Nvidia's Vera Rubin systems.
The interaction models announcement on May 11 is the company's most significant public research release since Tinker. It demonstrates a distinct technical direction — real‑time, multimodal, bidirectional interaction — that is different from the fine‑tuning infrastructure of Tinker and the foundation model competition of OpenAI and Anthropic. Thinking Machines pointed to what it called a problem with current AI models: they operate in a turn‑based mode, generating a response only after a user finishes input. In this mode, a model cannot take in new information while producing an answer, making it hard for a user to stay in the loop. The Interaction Model, by contrast, was designed from the start with a focus on real‑time responsiveness.
Whether TML‑Interaction‑Small's research preview, expected in the coming months, delivers on the commercial promise of full‑duplex interaction depends on factors that cannot be assessed from the announcement alone: how well the 0.40‑second latency holds under real‑world network conditions and diverse speech patterns, how naturally the model handles interruptions without losing the thread of its ongoing response, and how accurately it processes complex or ambiguous input while simultaneously generating output on an earlier segment of the same query.
The underlying technical direction is sound, grounded in academic research on full‑duplex spoken dialogue systems that has been advancing for several years. What Thinking Machines has done is engineer a production‑capable implementation of that research direction at a speed specification that is competitive with what OpenAI and Google have shipped in their voice interaction products.
Whether the interaction model announced today becomes the product that establishes Thinking Machines' commercial identity, the way Tinker established its research infrastructure identity, will be clearer when the limited research preview begins.
More at thinkingmachines.ai





