The underlying logic of AI server heat dissipation: How does liquid cooling technology cope with the surging heat dissipation demand?

Resource Center
Communication and sharing promote growth
Joining Hands for Development!

The underlying logic and response strategies for the surge in AI server cooling demand

2025.06.13 laney.zhao@walmate.com

The soaring computing power of AI servers is encountering "thermal constraints" - the power density of chips exceeds 1000W/cm² (such as NVIDIA H100), the power density of cabinets jumps from 2.4kW to 120kW, and the traditional air cooling solution reaches the physical limit of 8-10kW. The underlying logic of this "heat dissipation crisis" is the result of the quantum effect dilemma of chip architecture (the leakage rate surges below 3nm), the energy consumption imbalance of data handling (accounting for more than 90% of system power consumption), and the exponential demand for large model training (GPT-4 consumes 32.4TWh of electricity for a single training). Faced with strong policy constraints (China's PUE ≤ 1.25) and the pressure of energy efficiency, liquid cooling technology has moved from marginal experiments to mainstream and has become the key to unlocking AI computing power.

1-The underlying logic of AI server cooling requirements

a. Chip architecture and power consumption revolution

The computing power density (computing power output per unit area or per unit power consumption) and power consumption of AI chips are the core indicators for measuring their performance. The following is an analysis through three typical chip cases:

图片119EN.png

As computing power continues to upgrade, chip performance cannot be fully released due to the physical constraints of heat dissipation and energy efficiency. This "power consumption wall" phenomenon stems from:

· Heat dissipation lag, the growth rate of chip power density (>1000W/cm²) far exceeds the iteration speed of heat dissipation technology, and traditional solutions reach physical limits.

· Unbalanced energy consumption structure. At the physical level, the quantum tunneling effect below 3nm weakens energy efficiency improvement, and 3D stacking causes a sharp drop in heat dissipation efficiency by 30%-50%; at the architectural level, data handling energy consumption accounts for more than 90%, and the computing power growth rate (750 times/2 years) and memory bandwidth (1.4 times/2 years) are seriously out of balance; at the application level, large model parameter explosion (such as GPT-4 training consumes 32.4TWh) and dynamic load (instantaneous power consumption exceeds TDP by 200%) increase heat dissipation pressure.

图片120EN.png

Figure 1: The integration of storage and computing

b. Transition of cabinet power density

Cabinet power density is undergoing a revolution from general computing to AI-driven ultra-density, and the paradigm has been reconstructed from "equipment room adaptation" to "equipment definition room". AI computing power demand has forced the infrastructure to upgrade by leaps and bounds.

· Exponential transition, the global average density increased from 2.4kW/cabinet to 9.5kW/cabinet (CAGR≈12%) from 2011 to 2024, and the AI intelligent computing center has pushed the density to exceed 120kW/cabinet (such as NVIDIA GB200 NVL72), and may reach MW level in 2030.

· Core drive, the surge in AI chip power consumption (H100 single card 700W → GB200 single card 1200W) and the demand for large model training (GPT-4 single training consumes 32.4TWh of power) form a "double helix effect", forcing cabinet density to match the surge in computing power.

· Technological breakthrough, air cooling limit (8-10kW) is replaced by liquid cooling, cold plate type (20-50kW) and immersion type (50-120kW) support high density; power supply system upgraded to high voltage direct current (HVDC), efficiency> 98%; space utilization rate increased by 40% (air duct removed), liquid cooling reduces PUE to 1.08.

c. Policy and energy efficiency driven

· China's "East Data West Computing" project: mandatory requirements for eastern hub nodes PUE ≤ 1.25, western ≤ 1.2, forcing the popularization of liquid cooling technology. Taking the Inner Mongolia hub as an example, the use of immersion liquid cooling can reduce PUE to 1.08, saving more than 20 million kWh of electricity annually.

· Global carbon emission regulations: The EU CSRD directive requires data centers to disclose their full life cycle carbon footprint, and California's "Climate Enterprise Data Accountability Act" includes scope 3 emissions in mandatory disclosure. Liquid cooling technology has become the key to compliance due to the reduction of indirect emissions (such as refrigerant leakage).

· Economic benefit leverage: Liquid cooling saves 30%-50% of cooling energy compared to air cooling. Combined with the difference in peak and valley electricity prices, the investment payback period can be shortened to 3-5 years.

2- Evolution of heat dissipation technology and differentiation of technical routes

a. Liquid cooling technology: from edge to mainstream

The transition of liquid cooling from "edge experiment" to "heat dissipation base" is a rebalance between computing power density and energy efficiency, and a reconstruction of the data center value chain.

· Evolution process, marginalization stage (1960s-2010s), only used for supercomputing/military industry (such as Cray-2), limited by material corrosiveness and high cost; breakthrough period (2010s-2020s), GPU thermal density exceeded 500W/cm² (NVIDIA P100) and policy (China PUE≤1.25) drove commercial use, and the cost of cold plate transformation was reduced to 1.2 times that of air cooling; mainstream (nearly 2-3 years), AI cabinet power density exceeded 120kW (such as NVIDIA NVL72), liquid cooling TCO was 12.2% lower than air cooling, and the payback period was shortened to 3-5 years.

· Evolutionary logic, technical logic, from "air cooling to adapt to chips" to "chip-defined heat dissipation", liquid cooling becomes the core lever for releasing computing power; industrial logic, forming a positive cycle of "policy-driven standards → standards reduce costs → costs drive popularization"; ecological logic, reconstructing the data center value chain, turning the cooling system from a "cost center" to an "energy efficiency asset".

· Differentiation of technical routes, the differentiation of liquid cooling technology stems from the game between heat dissipation efficiency and transformation cost. The cold plate type prioritizes compatibility, balances costs and risks through local transformation, and adapts to medium-density scenarios; the immersion type pursues the physical heat dissipation limit, breaks through the heat density wall through system reconstruction, but faces material and operation and maintenance challenges; the spray type explores chip-level precise temperature control, paving the way for sensitive scenarios such as optical computing. The essence of liquid cooling technology differentiation is the trade-off result of the impossible triangle of "heat dissipation efficiency-transformation cost-operation and maintenance complexity". The cold plate type wins in balance, the immersion type pursues the physical limit, and the spray type aims at precise temperature control. The three together promote liquid cooling from a "technical option" to a "computing power base".

b. Chip-level cooling technology presents multi-dimensional breakthroughs

Chip-level cooling technology is undergoing a trinity of innovations in "materials-structure-control". In the short term, it is dominated by 3D microfluidics and cold plate liquid cooling (supporting kilowatt-level TDP), and in the long term, it relies on quantum cooling and photothermal synergy to break through physical limits. Its development directly determines the efficiency of AI computing power release and the evolution of data center energy efficiency.

· Materials, diamond/graphene approaches the physical thermal conductivity limit, and phase change materials solve transient thermal shock.

· Structure, microfluidics and cold plates shift from "external attachment" to "chip embedding", with shorter heat dissipation paths and higher efficiency.

· Control, solid-state active cooling chips break through volume limitations, and AI dynamic regulation realizes "heat-computing synergy".

· Core trend, the integration of the three promotes the evolution of heat dissipation from "passive heat conduction" to "chip-level active temperature control", supporting the kilowatt-level TDP requirements of single chips.

The evolution of heat dissipation technology has been upgraded from "single-point innovation" to "system reconstruction": cold plate liquid cooling leads the transformation of existing stocks with compatibility, immersion breaks through the physical heat dissipation limit, and chip-level spray technology explores precise temperature control. The three together build a layered heat dissipation system. As quantum heat dissipation and photothermal synergy technology go commercial, they will support MW-level ultra-dense computing power in a single cabinet in the future. This process is not only a revolution in the heat dissipation paradigm, but also drives data centers from "energy consumers" to "energy-efficient assets" - it is estimated that full liquid cooling can help global data centers reduce carbon emissions by 450 million tons by 2030. Heat dissipation is transforming from a cost center to the core foundation of the AI computing power economy.

We will regularly update you on technologies and information related to thermal design and lightweighting, sharing them for your reference. Thank you for your attention to Walmate.

recommend

What is the Difference Between Huawei and NVIDIA's Cooling Technologies? A Deep Dive into High-Density AI Clusters

2025.12.26

Key Challenges in Thermal Management for Large-Capacity Cells: Integrating Cooling with Pack Lower Enclosure Design

2025.12.05

Why is Liquid Cooling Mandatory for NVIDIA GB200? The Power Density & Reliability Breakdown

2025.11.28

NVIDIA GB200's Paradigm Shift: From 'Server' to 'Rack-Level Computing'

2025.11.21

测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试测试