SambaNova Cloud Launches the Fastest DeepSeek-R1 671B

A major breakthrough in artificial intelligence has been announced as DeepSeek-R1 671B, an advanced open-source reasoning model, becomes available on SambaNova Cloud. The model, capable of processing at speeds of 198 tokens per second per prompt, is set to redefine the efficiency of AI inference.
DeepSeek-R1 has gained significant attention for its ability to reduce training costs in developing reasoning models. However, inference using traditional GPU architectures has remained a bottleneck. SambaNova, a US-based company, has now demonstrated how its Reconfigurable Dataflow Unit (RDU) hardware can achieve superior inference performance. These speeds have been independently verified by Artificial Analysis, with SambaNova inviting developers to test the model via its cloud-based platform.
Developers can register for access through the SambaNova Cloud Developer Tier, with availability expanding over the coming weeks as infrastructure scales to meet demand.
DeepSeek-R1: An Open-Source Alternative
DeepSeek-R1 has positioned itself as a serious contender in AI reasoning, offering higher accuracy at a fraction of the cost of proprietary models. Built on a Mixture of Experts (MoE) architecture with 671 billion parameters, it has outperformed OpenAI’s o1 on key benchmarks, particularly in mathematical and logical reasoning tasks.
Unlike the 70B distilled version of the model, also available on SambaNova Cloud, DeepSeek-R1 generates more tokens before formulating an output. This extended reasoning process results in more accurate and nuanced responses. The model has even demonstrated an ability to optimise its own computational efficiency, an achievement that highlights its advanced capabilities.
SambaNova operates its RDU-powered cloud services from data centres in the United States. For organisations prioritising data privacy, the company offers on-premise deployment options, contrasting with DeepSeek’s own cloud service, which runs on GPUs and lacks similar security controls.
Addressing the AI Compute Challenge
The demand for DeepSeek-R1 continues to grow, but running such a complex model has proven challenging due to GPU inefficiencies. DeepSeek was forced to disable its own inference API service, citing constraints in compute capacity.
SambaNova’s RDU technology has been designed to handle large-scale MoE models efficiently. The SN40L RDU features a three-tier memory architecture that enables DeepSeek-R1 to be deployed using just one rack, compared to the 40 racks of 320 GPUs previously required for inference.
With this enhanced efficiency, SambaNova aims to provide 100 times the global inference capacity for DeepSeek-R1 by the end of the year. The company’s RDU chips are emerging as a leading inference platform for complex reasoning models, offering a scalable and cost-effective alternative to traditional GPU-based systems.