What are the key numbers?

The flagship model, TML-Interaction-Small, is a 276-billion parameter Mixture-of-Experts system with 12 billion active parameters, and it achieves a turn-taking latency of 0. On FD-bench V1.5, a benchmark built to measure interaction quality, it scored 77. On FD-bench V1.5, a benchmark built to measure interaction quality, it scored 77.8 against GPT's 46.

Thinking Machines previews real-time AI that listens and responds simultaneously

Quick Answers

What changed

Mira Murati's startup demos 'interaction models' that process input and output at the same time, posting faster response times than GPT and Gemini rivals.

Why it matters

This update matters for teams tracking technology strategy, product decisions, and competitive positioning. Use this to assess near-term execution risk and opportunity.

Key numbers

The flagship model, TML-Interaction-Small, is a 276-billion parameter Mixture-of-Experts system with 12 billion active parameters, and it achieves a turn-taking latency of 0.
On FD-bench V1.5, a benchmark built to measure interaction quality, it scored 77.
On FD-bench V1.5, a benchmark built to measure interaction quality, it scored 77.8 against GPT's 46.

Thinking Machines, the AI startup founded by former OpenAI CTO Mira Murati and researcher John Schulman, has unveiled a research preview of what it calls 'interaction models' - systems designed to listen, speak, and see all at once rather than waiting for a user to finish before responding. The core idea is to replace the familiar back-and-forth of current AI chat with something closer to a real conversation. The flagship model, TML-Interaction-Small, is a 276-billion parameter Mixture-of-Experts system with 12 billion active parameters, and it achieves a turn-taking latency of 0.40 seconds - compared to 1.18 seconds for GPT-realtime-2.0 and 0.57 seconds for Gemini-3.1-flash-live. On FD-bench V1.5, a benchmark built to measure interaction quality, it scored 77.8 against GPT's 46.8.

The technical architecture splits work between two components: an interaction model that manages live dialogue and an asynchronous background model that handles heavier tasks like web browsing or complex reasoning, feeding results back into the conversation naturally. Rather than relying on external audio encoders like Whisper, the system takes in raw audio as dMel and image patches through a lightweight embedding layer, training everything together inside the transformer. This lets the model issue filler responses while still processing what a user is saying, and proactively react to visual cues - like spotting a bug being written in code or noticing someone walk into a video frame.

The company says a limited research preview will open in the coming months, with a wider release planned for later this year - no public or enterprise access yet. Thinking Machines raised $2 billion at a $12 billion valuation in July 2025, led by Andreessen Horowitz with backing from Nvidia, Accel, and others. Despite losing several founding members to Meta earlier this year, the company has grown to around 130 employees and recently hired PyTorch creator Soumith Chintala as CTO.

Thinking Machines previews real-time AI that listens and responds simultaneously

Quick Answers

What changed

Why it matters

Key numbers

Related Articles

VPN Downloads Surge in India After Temporary Telegram Ban

Reliance Unveils AI Assistant for Calls, Apps and Homes as Ambani Pushes India AI Vision

Kevin O'Leary Cuts Utah Data Center Project in Half After Public Backlash

Google's Gemini Spark Is Impressive. But What's It Actually For?

Thinking Machines previews real-time AI that listens and responds simultaneously

Quick Answers

What changed

Why it matters

Key numbers

Related Articles

VPN Downloads Surge in India After Temporary Telegram Ban

Reliance Unveils AI Assistant for Calls, Apps and Homes as Ambani Pushes India AI Vision

Kevin O'Leary Cuts Utah Data Center Project in Half After Public Backlash

Google's Gemini Spark Is Impressive. But What's It Actually For?