DeepSeek to introduce V4 with 2x peak-hour API rates

Planned July release to launch time-based pricing to ease computing demand
Costs to double during daytime peak hours while off-peak rates remain unchanged

DeepSeek plans to launch the production version of its V4 large language model in mid-July, while introducing peak and off-peak API pricing that will double usage costs during the busiest hours of the day.

In an email sent to API customers on June 29, the company said the new pricing will take effect alongside the V4 release.

Peak hours are defined as 9 a.m. to noon and 2 p.m. to 6 p.m. Beijing time.

The scheduled price increase comes roughly a month after the Hangzhou-based AI upstart announced that it will permanently reduce the API pricing of its flagship model DeepSeek-V4-Pro to one-quarter of the original rate, marking an all-time low for token usage costs.

Double the rates

Under the new pricing schedule, input tokens for the V4 Pro model with cache hits will rise from 0.025 yuan ($0.0037) to 0.05 yuan per million tokens during peak periods.

Prices for uncached input tokens will increase from 3 yuan to 6 yuan, while output tokens will double from 6 yuan to 12 yuan. Equivalent rates for the V4 Flash model will also double during peak hours.

The changes are expected to have the greatest impact on latency-sensitive applications with frequent API calls, including real-time customer service bots, AI coding assistants and agent-based workflows, where workloads cannot easily be shifted outside business hours.

Unchanged pricing for off-peak hours

By contrast, batch processing, offline data summarization, data cleaning and other non-real-time workloads could reduce API costs by scheduling jobs overnight or on weekends, when standard pricing remains in effect.

The pricing update follows the full deployment of DSpark, an inference acceleration framework jointly developed by DeepSeek and Peking University.

According to the company, the system improves single-user generation speeds by 60% to 85% for V4 Flash and 57% to 78% for V4 Pro.

Allocating computing resources

Industry analysts said the move is less a straightforward price hike than a mechanism for allocating scarce computing resources more efficiently during periods of peak demand.

In a letter to API customers, DeepSeek said the production release of V4 will introduce additional feature enhancements and performance improvements, adding that the new pricing is intended to optimize resource allocation and improve service stability.

Users will receive at least 24 hours’ notice before any pricing adjustments take effect, DeepSeek said.

M	T	W	T	F	S	S
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

Double the rates

Unchanged pricing for off-peak hours

Allocating computing resources

Related News

World’s largest caviar producer Xunlong Tech jumps 51% in HK debut

Urtopia bags $29 million to expand beyond e-bikes into exoskeletons