Thinking Machines, the AI startup founded by former OpenAI CTO Mira Murati and researcher John Schulman, has unveiled a research preview of what it calls 'interaction models' - systems designed to listen, speak, and see all at once rather than waiting for a user to finish before responding. The core idea is to replace the familiar back-and-forth of current AI chat with something closer to a real conversation. The flagship model, TML-Interaction-Small, is a 276-billion parameter Mixture-of-Experts system with 12 billion active parameters, and it achieves a turn-taking latency of 0.40 seconds - compared to 1.18 seconds for GPT-realtime-2.0 and 0.57 seconds for Gemini-3.1-flash-live. On FD-bench V1.5, a benchmark built to measure interaction quality, it scored 77.8 against GPT's 46.8.
The technical architecture splits work between two components: an interaction model that manages live dialogue and an asynchronous background model that handles heavier tasks like web browsing or complex reasoning, feeding results back into the conversation naturally. Rather than relying on external audio encoders like Whisper, the system takes in raw audio as dMel and image patches through a lightweight embedding layer, training everything together inside the transformer. This lets the model issue filler responses while still processing what a user is saying, and proactively react to visual cues - like spotting a bug being written in code or noticing someone walk into a video frame.
The company says a limited research preview will open in the coming months, with a wider release planned for later this year - no public or enterprise access yet. Thinking Machines raised $2 billion at a $12 billion valuation in July 2025, led by Andreessen Horowitz with backing from Nvidia, Accel, and others. Despite losing several founding members to Meta earlier this year, the company has grown to around 130 employees and recently hired PyTorch creator Soumith Chintala as CTO.




