DeepSeek’s latest report admits for the first time that it owns some A100 chips and used them in the early stages of model development. According to the AI company, these A100 GPUs helped prepare a smaller version of R1 before the full training run, which lasted 80 hours on the H800 cluster.
Chinese artificial intelligence developer DeepSeek is making headlines in the tech industry after revealing that it spent just $294,000 to train its R1 model. It is surprisingly a low amount compared to the eye-watering budget that its US rivals reported. This new detail is likely to spark a new debate about China’s growing role in the global race to build smarter, more powerful AI.
What makes this announcement stand out is how rarely companies share the real costs behind developing their AI models. DeepSeek, headquartered in Hangzhou, had already caught the world’s attention earlier this year when it launched lower-cost AI systems, causing global investors to worry that these new players could threaten tech giants like Nvidia. After that, both DeepSeek and its founder, Liang Wenfeng, kept a low profile, releasing only a few updates, until this.
The article reveals that the Chinese AI company’s reasoning-focused R1 model was trained using 512 Nvidia H800 chips, all for under $300,000. That’s a fraction of the price hinted at by US companies. For example, OpenAI’s CEO, Sam Altman, has stated that training foundational models costs his company well over $100 million, although OpenAI has never shared the actual numbers.
Of course, DeepSeek’s claims have raised some eyebrows. Many US officials and tech companies have praised it for developing a low-cost alternative. However, many have questioned the reported costs and the hardware used by DeepSeek. The H800 chips in question were specifically made for China after the US government banned Nvidia from selling its more advanced H100 and A100 chips to Chinese firms in late 2022. While some US sources suggested that DeepSeek somehow got its hands on the restricted H100 chips, Nvidia said that DeepSeek used only H800s, which are allowed under export rules.
The company’s latest report admits for the first time that it owns some A100 chips and used them in the early stages of model development. According to the company, these A100 GPUs helped prepare a smaller version of R1 before the full training run, which lasted 80 hours on the H800 cluster.
This access to high-powered chips hasn’t gone unnoticed. The news publication previously reported that DeepSeek’s ability to run an A100 supercomputing cluster helped it attract some of China’s brightest AI minds.
DeepSeek’s new publication also addresses, if indirectly, claims from US officials and AI experts that the company “distilled” OpenAI’s models into its own. DeepSeek has defended the practice, arguing that distillation, where one AI learns from another, actually leads to better performance and makes training and running models far more affordable. That, DeepSeek says, is key to making AI more accessible, especially given the resources needed for such advanced models.
For context, model distillation is a common technique in AI development that lets a new model benefit from the knowledge of an established one, without repeating all the expensive groundwork.
DeepSeek has also been transparent about using Meta’s open-source Llama model as a base for some of its own distilled systems. The company notes that when training its V3 model, it relied on web data that included a lot of answers generated by OpenAI’s models. This means its own system may have indirectly picked up knowledge from OpenAI, but DeepSeek insists this was unintentional, just a side effect of training on publicly available web content.
OpenAI did not answer to a request for comment.
DeepSeek challenges the industry norms by disclosing its low AI training costs and has started the conversation around transparency, competition, and innovation in artificial intelligence.
