Hangzhou launches high-end data labeling hub to power AI training ecosystem

  • City partners with ByteDance’s Volcano Engine to strengthen supply of high-quality datasets for advanced AI applications.
  • Facility targets autonomous driving, embodied intelligence and healthcare as demand grows for specialized training data.

Hangzhou’s Shangcheng District yesterday unveiled a high-end data labeling base developed jointly with Volcano Engine, ByteDance’s cloud and AI services arm, as Chinese cities race to secure one of AI’s most critical — and often overlooked — resources: high-quality training data.

The new facility follows Volcano Engine’s rollout of similar advanced labeling centers in Beijing, Shanghai and Jiangsu, signaling an expansion of infrastructure designed to support the next phase of AI commercialization, where data quality rather than computing power is emerging as a key competitive factor.

Data labeling refers to the process of structuring raw information — including text, images, speech and video — into annotated datasets that machine-learning systems can interpret and learn from.

The Hangzhou base will focus on higher-value labeling tasks tied to complex industrial scenarios such as autonomous driving, embodied intelligence, smart healthcare and industrial AI.

Officials said the center is intended to serve as a national hub for high-quality data supply, a testing ground for labeling technology innovation and a platform for industry collaboration. The project will prioritize multimodal data annotation while promoting standardized and intelligent workflows across the sector.

Lou Ying, deputy director of Shangcheng’s data resources bureau, said the base will operate across three main functions: producing high-quality annotated datasets through precision labeling and quality verification, advancing data technology through industry-academia collaboration, and cultivating specialized talent through training programs intended to support the growing AI workforce.

Developing industry-specific datasets lies at the heart of the partnership. Shangcheng has already released 102 “AI Plus” application scenarios, and the new center will initially concentrate on sectors including intelligent mobility, biomedicine, fintech and digital fashion, building tailored datasets aligned with real industrial needs.

“Our AI-assisted labeling tools can improve annotation efficiency by more than 60%,” said Xiao Ran, General Manager of Intelligent Data Platform Solutions at Volcano Engine.

According to him, the company has accumulated more than 8,000 ready-to-use high-quality datasets spanning 45 data domains and more than 50 languages. These datasets constitute resources that Xiao described as essential “fuel” for enterprise AI training and deployment.

An official from Hangzhou’s Shangcheng District signs a cooperation agreement with ByteDance’s AI and cloud service arm Volcano Engine yesterday that marks the city’s ambition to push for faster commercialization of high-end industry datasets.

Building a ‘chain leader’ model

The initiative aligns with Hangzhou’s broader ambition to become a leading national AI innovation hub.

Shangcheng was selected in part because its central innovation district requires strong data infrastructure across multiple functional zones, while Volcano Engine brings established technical capabilities and ecosystem-building experience, said Yao Honghua, vice governor of Shangcheng.

For its part, Volcano Engine said it chose Shangcheng for its dense urban industry base and supportive policy environment.

The district already hosts about 500 data-related companies across the full value chain — from data resources and technology to security and infrastructure — with the local data industry expanding at an average annual rate exceeding 15%.

At the launch ceremony, Volcano Engine also introduced eight inaugural partner companies to form the base’s industry ecosystem. This move is in keeping with the city’s strategy of building a “chain leader” model aimed at attracting broader AI industry participation as demand accelerates for specialized datasets.