DeepSeek unveils V4 after prolonged teaser rollout and delay

  • Preview version open-sourced with 1-million-token context window
  • Chip migration and funding pressures behind delay

DeepSeek has released its long-awaited V4 model after a prolonged development cycle, unveiling a preview version and open-sourcing it on April 24.

The model supports a context window of up to 1 million tokens and is positioned as an upgrade in agent capability, world knowledge and reasoning performance among domestic and open-source systems.

DeepSeek-V4 is available in two configurations. The Pro version has 1.6 trillion parameters with 49 billion active parameters, while the Flash edition has 284 billion parameters with 13 billion active parameters.

Both variants support the same extended context length.

The system is built on a mixture-of-experts (MoE) architecture combined with sparse attention, aimed at improving efficiency by reducing computational load and memory usage in attention operations, particularly for agent-based tasks.

The release follows an extended, almost 3-month delay that industry sources attribute in part to a transition in training infrastructure from Nvidia GPUs to Huawei Ascend chips, alongside internal adjustments to development priorities.

According to individuals familiar with the matter, DeepSeek also encountered a major training setback in mid-2025 that required reworking parts of its model training pipeline.

“DeepSeek had to re-adapt to a new chip architecture,” one source was quoted as saying in Chinese media reports. “There were also differences internally on training direction, and some technical requirements were difficult to reconcile during execution.”

Despite earlier expectations that V4 would include multimodal capabilities, the model remains text-only.

The decision to postpone multimodal training was driven primarily by constraints in compute capacity and funding.

Stemming the talent exodus

Those constraints have contributed to earlier reports that the company is open to external funding to continue fueling its growth and retain top talent.

According to Chinese media, DeepSeek’s financing window is understood to have opened in mid-April 2026, driven by the need to support larger-scale model training and strengthen talent retention.

One individual with knowledge of the matter said the model’s scale does not yet clearly differentiate it from frontier systems like OpenAI’s ChatGPT-5.5 or Anthropic’s Claude.

“At 1.6 trillion parameters, it is not necessarily ahead of leading global models,” the person said.

By comparison, GPT-5.5 is estimated at around 1.8 trillion total parameters, with approximately 400 billion active parameters.

The company has also faced talent turnover, with key researchers including core contributors to earlier DeepSeek models R1 and LLM — Guo Daya and Wang Bingxuan — reportedly leaving DeepSeek to join established technology firms such as ByteDance and Tencent.

This talent outflow has turned up the pressure on the startup to secure additional funding.

Separately, industry sources said DeepSeek’s founder Liang Wenfeng was previously in talks with the president of an unspecified top domestic tech titan over a potential exclusive investment deal, but no agreement was reached over terms reportedly involving a 20% equity stake.

Industry observers have pointed out that since the release of its earlier R1 model, DeepSeek has been viewed as shifting from a primarily research-oriented organization toward a more commercially focused AI developer, amid intensifying competition in China’s LLM sector.