- V4-Pro pricing cut to a quarter of original level, with lowest rate at 0.025 yuan per million tokens
- Move intensifies competition in China’s large language model market
DeepSeek said on May 22 it will permanently reduce the API pricing of its flagship model DeepSeek-V4-Pro to one-quarter of the original rate, setting what it says is a new global low for large language model costs.
The company said a previously scheduled discount program, which was set to end in June 2026, will instead be made permanent.
Under the revised pricing, input tokens with cache hits are priced at 0.025 yuan ($0.0037) per million tokens, down from 0.1 yuan; input tokens without cache hits at 3 yuan, down from 12 yuan; and output tokens at 6 yuan, down from 24 yuan.
Cache hits refer to requests where the model reuses previously computed results, significantly reducing computation costs, while cache misses require full reprocessing of inputs.
At the new rates, DeepSeek said usage cost of its V4-Pro model now effectively drops to industry-low levels, with cached input costing as little as 2.5 fen, the smallest unit of the Chinese currency, per million tokens.
DeepSeek previously released the V4-Pro model on April 24, featuring 1.6 trillion parameters and support for one-million-token context windows, with usage surging following earlier price cuts, according to earlier reporting.
The pricing move comes as rivals also have adjusted their own offerings. China already has some of the world’s lowest token usage costs.
ByteDance’s Doubao (Seed 2.0 lite) charges about 0.6 yuan per million input tokens without cache and 3.6 yuan for output, while Alibaba’s Qwen3.6-plus charges 0.2 yuan for cached input, 2 yuan for uncached input and 12 yuan for output.
Zhipu AI’s GLM-5.1 is priced at about 3.9 yuan for input and 31.8 yuan for output, while Baidu’s Ernie 4.5 Turbo charges 0.2 yuan for cached input, about 0.8 yuan for uncached and 3.2 yuan for output.
