LIQUID COOLING IS NOT NEW, BUT NOW, IT'S NECESSARY
![two men and a woman sitting around a computer at a desk](/static/1f77265f6b6e2ef8d3050ee1e8583c9f/16f1d/AHEAD-io-0062.jpg)
The Demand for Liquid Cooling
Artificial Intelligence, or AI, has permeated almost every area of our lives, from the simplicity and democratization of ChatGPT capturing the world’s interest to the advancements of Google Gemini. We are seeing its adoption in everyday life, with consumers utilizing it to create cookbooks, edit photos, or manage their schedules. And then there are the more consequential use cases, such as ensuring safety on railroad tracks and oil rigs, or for patient care in hospitals.
Since 2022, the advancements in AI have been staggering, with around a 1600 percent change in AI model parameters. With the numbers encroaching ludicrous mode, they bring with them immense challenges in the world of data center computing.
A three-year depreciation cycle was once considered the gold standard for compute infrastructure, with some organizations pushing the limits as far as seven years. But AI is different. The rapid pace of change in the AI compute landscape is introducing vastly more capable systems at an average rate of 12 months. To keep pace, it is becoming the norm for enterprises to procure the latest technology just to stay afloat. Unfortunately, this new kit comes at a price, and not just the price of the hardware itself. These systems are HUNGRY. Very, very hungry. For what? Power.
Power-Hungry Is an Understatement
To put power-hungry into perspective, we in the data center business have been running standard racks at an average installed power of 10-20kW. Most standard racks are getting 30-50 amps of 208 three-phase power. The current power requirement for a full rack of NVIDIA’s latest Blackwell NVL72 comes in at a whopping 120kW. That is 600 percent more power than an average installation. To make matters ever more complex, it is rumored that the next generation of AI compute could push this number to a whopping 250kW.
By now you are probably asking yourself, “Why the heck are we talking about power? Aren’t we supposed to be talking about liquid cooling?” Well, we can’t talk about liquid cooling without talking about power, because it is the power requirements of these AI systems that are ushering in the absolute need for liquid cooling.
Let’s make this simple. James Maxwell, a famous physicist and mathematician, created the formula that underpins and informs our entire understanding of energy and, therefore, its conservation. So, all one really needs to know is ∂E/∂t + ∇·(F) = Q! Just kidding. Well, not about the math, because it does, in fact, come down to energy. And with this need for more energy comes two primary challenges that must be addressed.
Challenge One: Energy can’t be destroyed; it can only change forms.
AI infrastructure is consuming power at an ever-increasing rate, which is pushing data centers to the limit in terms of power capacity. To understand the extent to which data centers are being pushed, consider the fact that Microsoft recently signed a 20-year deal to purchase power from the Three Mile Island nuclear power plant, located in Pennsylvania, with the sole purpose of powering AI compute.
But the challenge does not just start with the increased power requirements of the systems. We have a basic physics problem: Energy can’t be destroyed; it can only change forms. And in this instance, we are converting massive amounts of energy to equally massive amounts of heat. Traditional data centers have had to use monumental cooling systems to tackle the heat generated by IT infrastructure. It may be idealistic to think we should simply build systems that produce less power, but that pesky physics thing just keeps getting in the way.
Challenge Two: Air is not an efficient conductor of heat.
For decades we have used traditional air conditioning— delivered through forced air— to cool data centers. This may be a little nerdy, but let’s put some numbers to this. The scientific term for how well something can absorb heat is called “thermal conductivity.” We measure this with the unit W/mK (Watts per meter Kelvin). The name is not necessary for the conversation, but understanding the relationship between air and water is crucial. Air has a thermal conductivity coefficient, or ability, of .03W/mK, while water has one of .6W/mK. In the simplest terms possible, water is 20 times more efficient at removing heat than good old-fashioned air.
Moving AHEAD to Direct-to-Chip Water Cooling
So, where does this leave us? Data centers are already struggling to generate the power needed to electrify these modern electron-guzzling AI systems, never mind the additional electricity necessary to power the traditional cooling systems that keep these data centers operational.
There are a few routes that can be taken with liquid cooling, as the term encompasses not only water as an option, but also specialized coolants, such as dielectric fluids. The benefit of direct-to-chip water cooling is that it allows for highly localized cooling that targets the hottest components by applying the water directly to the processor chip using a cold plate. Not only does this method cool 20 times more effectively than air-cooling, but it also reduces the electrical requirement for cooling equally by over 20 times. To sound like a politician, “It’s for the children and for the environment.” Despite the lack of power availability that is forcing organizations to adopt liquid cooling in the data center, it comes at a great benefit to the environment not only by reducing energy consumption, but also by reducing greenhouse gas emissions. As we know, the more we care for the environment the better it is for our children!
In closing, liquid cooling isn’t a fad: It’s a critical shift in high-octane compute architecture that is becoming a requirement to effectively deploy and cool the next generation of HPC and AI compute. So, is your enterprise ready?
About the author
Mike Menke
Field CTO
Mike is a strategic problem solver focused on advancing AHEAD's artificial intelligence portfolio and capabilities.