Miami Startup Subquadratic Says Its New Model Cuts AI Compute by 1,000x. Researchers Want Receipts.

Subquadratic emerged from stealth with SubQ 1M-Preview, a model it claims breaks the quadratic scaling wall that has defined AI since 2017. The research community is split between intrigued and openly skeptical.

By Nischay Nagpal

May 5, 2026•Updated May 13, 2026•2 min read

Editorial Policy•Corrections Policy

Quick Answers

What changed

Why it matters

This update matters for teams tracking technology strategy, product decisions, and competitive positioning. Use this to assess near-term execution risk and opportunity.

Key numbers

Subquadratic emerged from stealth with SubQ 1M-Preview, a model it claims breaks the quadratic scaling wall that has defined AI since 2017.
At 12 million tokens, Subquadratic says its approach reduces attention compute by nearly 1,000 times compared to frontier models.
The company reports a 7.2x prefill speedup over dense attention at 128,000 tokens, climbing to 52.

A Miami startup called Subquadratic stepped out of stealth on Tuesday with a claim big enough to stop the AI industry mid-scroll. Its first model, SubQ 1M-Preview, is what the company calls the first large language model built on a fully subquadratic architecture, where compute grows linearly with context length instead of exploding. At 12 million tokens, Subquadratic says its approach reduces attention compute by nearly 1,000 times compared to frontier models. The company has raised $29 million in seed funding at a reported $500 million valuation, with backers including Tinder co-founder Justin Mateen and former SoftBank Vision Fund partner Javier Villamizar. It is launching three products in private beta: an API, a coding agent called SubQ Code, and a tool called SubQ Search.

The technical pitch centers on Subquadratic Sparse Attention, or SSA. Instead of comparing every token to every other token, the model learns which comparisons actually matter and skips the rest. The company reports a 7.2x prefill speedup over dense attention at 128,000 tokens, climbing to 52.2x at 1 million. On SWE-Bench Verified it scored 81.8%. On RULER at 128K it hit 95%. On MRCR v2, a multi-hop retrieval test, a third party verified 65.9%, well above Claude Opus 4.7 but behind GPT-5.5. Only three benchmarks have been published. There is no peer-reviewed paper yet. A full model card is listed as coming soon.

The reaction has been loud and divided. Engineer Will Depue argued the scaling math does not add up and suggested SubQ is a sparse attention finetune of an existing open model, which CTO Alexander Whedon confirmed. Others called the benchmarks cherry-picked. Some researchers defended the work as a careful execution of sparse attention. Skeptics keep pointing to Magic.dev, which made similar 1,000x claims in 2024 and has shown little public traction since. Earlier subquadratic efforts like Mamba, RWKV, and Kimi Linear all ran into the same wall: linear in theory, less impressive on real downstream tasks. Subquadratic addresses each of those approaches in its technical blog and argues SSA avoids their tradeoffs. Whether that holds will come down to independent evaluation, not launch-day numbers.