NVIDIA Nemotron 3 Ultra: Inside the Fastest Open-Weight AI Model Built in the US
NVIDIA's 550B-parameter Nemotron 3 Ultra is now the top US open-weight model. Here's what its hybrid Mamba-Transformer design means for builders.
On June 4, 2026, at Computex, NVIDIA released Nemotron 3 Ultra — a 550-billion-parameter open-weight language model the company is calling the most intelligent open-weight model built in the US. It's a meaningful marker for the open-source AI race: a US-trained, fully open model that can credibly compete with the best Chinese open-weight systems on raw capability, while running considerably faster.
For builders and agencies evaluating which models to put into production, Nemotron 3 Ultra is worth a close look — not just for its benchmark scores, but for the architecture decisions behind it.
Nemotron 3 Ultra uses a hybrid design that blends Mamba-2 layers with traditional Transformer attention layers, wrapped in a Mixture-of-Experts (MoE) structure. Of its 550 billion total parameters, only about 55 billion are active per token — which is what keeps inference fast despite the model's size.
The Mamba-2 layers do most of the heavy lifting on long sequences. Unlike standard attention, which gets quadratically more expensive as context grows, Mamba layers scale far more efficiently. That's the key technical reason Nemotron 3 Ultra can support a 1-million-token context window in practice rather than just on a spec sheet. NVIDIA also pre-trained the model in NVFP4, a lower-precision format that further improves throughput.
The result, according to early benchmarking on hosted endpoints, is over 300 output tokens per second — three to six times faster than comparable Chinese open-weight models like DeepSeek V4 Pro and Kimi K2.6, which run at roughly 50-100 tokens per second.
On the Artificial Analysis Intelligence Index, Nemotron 3 Ultra scores 47.7, putting it comfortably ahead of the next-best US open-weight models: Gemma 4 31B (39.2) and NVIDIA's own smaller Nemotron 3 Super (36.0). That makes it the strongest open-weight model trained in the US by a clear margin.
It's worth being honest about where it stands globally, though. China's Kimi K2.6 still leads the open-weight field overall with a score of 53.9. Nemotron 3 Ultra's real edge isn't raw intelligence-index supremacy — it's the combination of strong reasoning, a genuinely usable long context window, and inference speed that few models in its capability class can match.
NVIDIA didn't just release the model weights. The company published base weights, post-trained checkpoints, reward models, NVFP4-quantized variants, training recipes, and datasets — all under OpenMDW-1.1, a permissive open AI model license maintained by the Linux Foundation.
OpenMDW is designed to cover the entire bundle of model materials (architecture, parameters, documentation, and related software) under one consistent set of terms, rather than the patchwork of custom licenses many model releases ship with. For companies that need legal clarity before deploying an open model commercially, that consistency reduces a real source of friction.
Nemotron 3 Ultra is already available on Hugging Face, OpenRouter, and NVIDIA NIM, so there's no shortage of ways to start testing it.
A few practical takeaways if you're choosing models for agents, RAG pipelines, or other production AI workloads:
Our cutting-edge features simplify collaboration and creativity, making your workflow intuitive and efficient. Transform your vision into reality effortlessly with Hadidiz Flow.



