DeepSeek's Rise and Industry Impact: A Concise Summary
Executive Summary:
DeepSeek has recently become the dominant topic in AI, surpassing Claude, Perplexity, and Gemini in daily traffic. While this surge in public attention is recent, industry insiders have recognized DeepSeek's capabilities for months. Despite the current hype suggesting DeepSeek's efficiency negating the need for more compute power, the reality is more nuanced. While algorithmic improvements are driving efficiency gains, the induced demand from models like DeepSeek is already impacting GPU pricing, reflecting a Jevons paradox in action.
This is emblematic of a broader cycle in AI: early breakthroughs generate hype, followed by cost inefficiencies that ultimately lead to an atomization phase, where models become smaller, more specialized, and dramatically cheaper to train and use. Today’s dominant LLMs are still in the first stage of this cycle, and while we are beginning to see improvements in cost structures and pricing models, the AI industry is only scratching the surface of what real efficiency looks like. Much of the media narrative around DeepSeek—and AI at large— overstates the immediacy of efficiency breakthroughs while ignoring key trade-offs, presenting an incomplete picture of the landscape.
In this document, we break down the realities behind DeepSeek’s infrastructure, cost efficiencies, and broader market positioning, as well as what this means for AI’s evolving competitive landscape. We also highlight how our portfolio companies are well-aligned with key trends shaping the sector.
DeepSeek's Origins and Resources:
DeepSeek originated as a spin-off from High-Flyer, a Chinese hedge fund utilizing AI in trading. High-Flyer, recognizing AI's broader potential and the importance of scaling, invested in 10,000 A100 GPUs in 2021 before export restrictions. This investment proved crucial, leading to the creation of DeepSeek in May 2023, self-funded by High-Flyer due to initial investor skepticism about AI business models. DeepSeek and High-Flyer maintain resource sharing, and DeepSeek is now a significant AI endeavor, not a mere "side project." Their GPU investment is estimated to exceed $500M, even considering export controls.
DeepSeek's GPU Infrastructure:
DeepSeek is believed to possess around 50,000 Hopper architecture GPUs, a mix of approximately 10,000 H800s (computationally similar to H100s but with lower bandwidth), 10,000 H100s, and orders for H20s (China-specific GPUs, over 1 million produced by Nvidia in 9 months). These GPUs, shared with High-Flyer and geographically distributed, support trading, inference, training, and research. Latest estimates place DeepSeek’s total server CapEx at nearly $1.3B - $500M in GPU investment and $715M in operational costs, including server set up and maintenance, data center operations, and other expenses related to running the servers. Unlike X.AI which centralizes GPUs, DeepSeek distributes resources, common among AI labs and hyperscalers managing diverse tasks.
Talent and Operational Advantages:
DeepSeek exclusively hires talent from China, prioritizing capability and curiosity over credentials, recruiting heavily from top universities like PKU and Zhejiang. They offer highly competitive salaries, reportedly over $1.3 million USD for top candidates, exceeding major Chinese tech companies and AI labs. With roughly 150 employees and rapid growth, DeepSeek operates as a nimble, well-funded startup, avoiding bureaucracy and leveraging self-funding for rapid innovation, similar to Google in its independent datacenter operations, fostering experimentation across the technology stack. DeepSeek is currently the leading "open weights" lab, surpassing Meta's Llama and Mistral.
Debunking the $6M Training Cost Narrative:
The reported "$6M" training cost for DeepSeek V3 is misleading, representing only the GPU cost of pre-training, a fraction of the total expense. This figure accounts only for the final stage of pre-training, overlooking the costs of building the cluster, running multiple training iterations, extensive R&D, man-hours, and GPU time spent refining the model. Developing new architectures, like Multi-Head Latent Attention, involves significant R&D, man-hours, and GPU time. A more realistic comparison is Claude 3.5 Sonnet's training cost in the "tens of millions," highlighting the substantial investments needed for experimentation, architecture development, data acquisition, and personnel. DeepSeek's access to a large GPU cluster was facilitated by lags in export control implementation.
V3's Performance and Algorithmic Progress:
DeepSeek V3 is indeed impressive, often compared to GPT-4o and shown to outperform it. However, AI advances rapidly, and GPT-4o was released in May 2024, making comparisons across timeframes less direct. Algorithmic improvements consistently reduce the compute needed for comparable AI capabilities, demonstrated by smaller models now matching GPT-3's performance. This trend, now highlighted by DeepSeek's Chinese origin, is not new. Algorithmic progress is estimated at 4x annually, with some suggesting up to 10x, evidenced by a 1200x decrease in GPT-3 inference costs. While GPT-4 also shows cost reduction, the capability increases concurrently, creating a 10x improvement in cost-capability ratio. DeepSeek is unique in achieving this cost-capability level first and releasing open weights, although Mistral and Llama have also released open models. Continued cost reductions are anticipated, possibly another 5x by year-end.
All of this is to say that DeepSeek-V3 is not a unique breakthrough or something that fundamentally changes the economics of LLM’s; it’s an expected point on an ongoing cost reduction curve. What’s different this time is that the company that was first to demonstrate the expected cost reductions was Chinese.
R1's Reasoning Prowess and Google's Gemini Flash 2.0:
DeepSeek's R1 model matches the performance of o1 (OpenAI's model announced in September), showcasing rapid progress in reasoning capabilities. This speed is attributed to a shift in paradigm towards reasoning, which offers faster iteration and easier optimization compared to the previous pre-training focused approach. Reasoning development leverages synthetic data generation and Reinforcement Learning (RL) in post-training, enabling quicker gains with less compute. While R1's paper omits compute details, generating synthetic data for post-training and RL at scale requires substantial compute resources. Despite R1's impressive reasoning capabilities, benchmarks are selectively presented, and it is not universally superior to o1. Furthermore, OpenAI's o3 surpasses both R1 and o1 significantly. Google's Gemini Flash 2.0 Thinking, released a month before R1 and costing less, rivals R1 in reasoning benchmarks, though with limited benchmark data and less hype, potentially due to Google's go-to-market execution and the "Chinese surprise" factor of R1. DeepSeek's agility as a startup enables it to outperform giants like Meta in reasoning model releases.
Technical Innovations Driving DeepSeek's Efficiency:
DeepSeek's advancements stem from key technical innovations, expected to be quickly adopted by Western labs. V3, the base for R1, incorporates Multi-Token Prediction (MTP) at an unprecedented scale during training, improving performance and discarded during inference, an algorithmic optimization reducing compute needs. FP8 training, also used by leading US labs, is likely employed. V3 utilizes a Mixture-of-Experts (MoE) architecture, efficiently routing tokens to specialized sub-models via a "gating network," minimizing parameter changes during training and boosting both training and inference efficiency. While MoE efficiency gains could theoretically reduce investment, Dario, Anthropic’s CEO argues that the substantial economic benefits of more capable AI drive reinvestment into even larger models, accelerating AI scaling. R1's reasoning capabilities are built upon V3 and fine-tuned with RL on synthetic datasets, similar to o1's development. The compute used for R1 is intentionally undisclosed, potentially to mask the extent of DeepSeek's GPU resources. Concerns exist regarding DeepSeek's potential use of OpenAI model data, raising questions about data provenance and potential KYC measures to prevent model distillation. Interestingly, R1 demonstrates that smaller, non-reasoning models can be transformed into reasoning models via fine-tuning with outputs from reasoning models, using a dataset of 800k samples, potentially democratizing reasoning model development.
Multi-head Latent Attention (MLA) for Inference Cost Reduction:
A crucial innovation is Multi-head Latent Attention (MLA), significantly reducing inference costs by decreasing KV Cache requirements by 93.3% compared to standard attention. KV Cache, a memory mechanism, grows with conversation context, creating memory bottlenecks. MLA's reduction in KV Cache lowers hardware needs per query, decreasing costs. MLA has garnered significant attention from US labs. DeepSeek also benefits from H20's higher memory bandwidth and capacity for inference. Therefore, there is some logic to the lower price DeepSeek is able to charge. Nevertheless, the cost reduction is still too dramatic as Google Gemini Flash 2 have also used a MLA approach, and their cost is still far off of DeepSeek’s. This has led some to believe that DeepSeek has been receiving subsidies from Partnerships with the likes of Huawei. Partnerships with Huawei exist, but Ascend compute utilization is minimal thus far.
Broader Market Implications and Pricing Dynamics:
DeepSeek's R1 matches o1's capabilities at a much lower price, prompting a pricing framework analogy to semiconductor fabrication. Leading capability providers (like TSMC in chips, OpenAI initially in AI) command premium pricing. Competitors offering similar capabilities (Samsung, Intel in chips; DeepSeek now) price lower to achieve price-performance parity. AI labs, like chipmakers, can shift capacity to newer models, maintaining older models with reduced resources. This suggests a commoditization of capabilities, with a constant race for the leading edge to maintain pricing power. DeepSeek's near-zero margin pricing on R1 challenges OpenAI's premium pricing, which was justified by frontier capabilities. The AI industry may follow a hyper-accelerated chip manufacturing model, where leading-edge capability drives pricing power (like ChatGPT Pro), while lagging capabilities become commoditized. This rapid product cadence and pursuit of leading-edge features, creating new value, justifies pricing power, otherwise, commoditization prevails in the open model market. The chip industry analogy, despite its capital intensity and R&D investment, ironically raises concerns about the chip industry's benefit from AI model companies. However, the analogy to Jevons paradox and transistor scaling is strong: initial uncertainty about transistor scaling gave way to focused efforts, building complex functions. AI model scaling mirrors this, suggesting a period of rapid progress beneficial to Nvidia and hardware providers.
DeepSeek's Subsidized Inference and Jevons Paradox in Action:
DeepSeek's low pricing might be a strategic move with minimal or negative margins, especially as they seek funding. This aggressive pricing disrupts OpenAI's leadership margins at the reasoning model level, though DeepSeek remains a "fast follower." A strong open lab like DeepSeek benefits "Neoclouds" and service providers, potentially increasing compute demand even with cheaper inference. If top-layer products are free, compute demand and hardware spending increase. Early signs of Jevons paradox are evident: AWS H100 pricing is rising in many regions post-V3 and R1 releases, and H200 availability is tightening. Increased intelligence at lower cost drives demand, reversing previous sluggish H100 spot pricing.
Export Controls, Geopolitics, and DeepSeek's Future:
Geopolitics frames DeepSeek's rise against Western labs, with ongoing East-West AI competition. Anthropic's CEO supports export controls, already in place for AI diffusion. Narratives of export control failure due to DeepSeek are misinterpretations. Initial controls restricted H100 but allowed H800 (computationally similar), then H800 was restricted, leaving only H20. Despite huge demand, Nvidia canceled significant H20 orders in January (likely pre-empting a US ban). Grace periods between export control laws likely allowed DeepSeek to stockpile chips. Export controls aim to limit, not eliminate, China's AI ecosystem, targeting hundreds of thousands to millions of chips, not just tens of thousands. Future H20 bans are anticipated, further restricting DeepSeek's chip access. DeepSeek faces capacity challenges serving surging demand, despite strong inference technology. Sign-up closures and slow R1 responses indicate strain. Export controls will increasingly impact model and serving capability scaling.
Chinese Government Support and Potential Shifts:
The Bank of China announced a USD$140B (1 Trillion Yuan) subsidy for the AI industry chain over 5 years, following a meeting with DeepSeek's CEO. This subsidy aims for Chinese self-reliance in science and technology, focusing on AI, robotics, biotech, and new materials, including computing infrastructure and risk management for new technologies. Future export controls will likely widen the gap, as US labs with access to innovations scale further than China can. While China may produce comparable models, they are projected to remain followers. DeepSeek's open-source model strategy may shift, particularly with increasing CCP interest in protecting algorithmic innovations.
AI Cost Optimization and Efficiency: How Hetz Portfolio Companies Are Leading the Shift
The rapid evolution of AI infrastructure, model efficiency, and cost optimization is not just an abstract trend—it is actively shaping market dynamics, creating a wealth of opportunities for companies that can drive meaningful efficiency gains. Across all Hetz funds, our portfolio companies are well-positioned at the forefront of these shifts, addressing critical challenges in AI cost structures, inference optimization, and model deployment.
At a lower level, one of our early portfolio companies is tackling one of the most fundamental bottlenecks in AI development—improving memory efficiency and system-level optimization for large-scale machine learning workloads. This kind of foundational improvement is critical as LLMs evolve toward more cost-effective architectures.
Deepchecks enables AI practitioners to automate model monitoring and validation, reducing expensive failure cases and inefficiencies in production deployments. As AI adoption scales, the need for robust model observability and risk mitigation becomes essential for sustaining cost-effective operations.
With the increasing demand for optimized inference, hardware innovations that reduce energy consumption and improve processing throughput will be key to unlocking AI's next phase of cost efficiency. From an infrastructure standpoint, another early Hetz portfolio company is pioneering advancements in AI compute efficiency.
On the developer tooling side, Tabnine is leading the charge in efficient, on-device AI for code generation, reducing reliance on expensive cloud-based inference. By shifting workloads away from centralized, high-cost API models, Tabnine exemplifies the kind of localized AI optimization that will become more widespread as the industry moves beyond massive monolithic models.
Similarly, Runhouse is streamlining AI deployment and orchestration, making it easier for companies to optimize their compute and operational efficiency. As LLM architectures atomize and enterprises look to tailor AI models to their specific needs, flexible and cost-effective deployment tools like Runhouse will be instrumental.
A crucial but often overlooked factor in AI cost optimization is data quality. Upriver addresses this issue by enforcing data contracts at the source, ensuring that AI models are trained and run on high-quality, reliable data. Poor data quality leads to inefficiencies in AI workflows, requiring expensive retraining, additional compute resources, and unnecessary complexity. By preventing bad data from propagating through AI pipelines, Upriver plays a vital role in reducing wasted compute cycles and improving the overall efficiency of AI systems.
Across all these companies, a clear pattern emerges: the AI industry is still in its early stages of optimization, and there are significant efficiency gains yet to be realized. The next wave of innovation will be driven by companies that can reduce costs, improve performance, and enable scalable AI adoption without runaway infrastructure expenses—a thesis that is actively playing out in our portfolio.
Conclusion:
DeepSeek’s rise underscores both the rapid advancements and ongoing structural challenges in the AI landscape. While its efficiency gains are notable, the broader industry remains in an early stage where cost structures, pricing models, and infrastructure demands are still evolving. The current hype often oversimplifies the reality—DeepSeek may be pushing AI forward, but fundamental economic and technical constraints remain.
As the market shifts toward greater efficiency and specialization, the role of optimized inference, cost reduction, and hardware innovation will only grow in importance. The companies driving these advancements—whether through infrastructure optimization, model efficiency, or streamlined deployment—will be best positioned for the next phase of AI’s evolution. Our portfolio is aligned with this shift, and we continue to see significant investment opportunities in companies addressing these challenges head-on.