STANDING UP A 10MW LIQUID-COOLED RACK INTEGRATION FACILITY

Why Build a Liquid-Cooled Rack Integration Facility?
Part 1 of a 4-Part Series
At AHEAD Foundry™, we help organizations navigate the rapidly evolving world of infrastructure for high-performance computing and AI. Working at the forefront of cutting-edge technologies, we see firsthand the challenges of rising power demands and the shift toward advanced cooling solutions.
I thought it would be interesting to openly share the journey of building a liquid-cooled rack integration facility – a project that’s especially meaningful to me. Why? Because 1) my team and I have been deeply immersed in it, and 2) I’ve always valued real-world examples of innovative technology in action. What’s that? Yes, yes, I am a geek.
But enough about me – let’s get to the good stuff, starting with some set-up and context.
A Single Rack That Can Consume as Much Power as Hundreds of Homes
The next wave of AI and HPC servers aren’t just powerful – the power they need can create an inferno. These machines don’t just compute; they generate heat at an unprecedented scale, pushing cooling technologies to their absolute limits.
Already, we’re seeing GPUs and advanced CPUs roaring past 700 watts of thermal design power (TDP). The NVIDIA® Grace Blackwell GB200 Superchip, for example, combines a Grace CPU with two Blackwell GPUs, leading to an estimated total power draw of approximately 2,700 watts for the superchip alone.
How did we get here? Not long ago, a 30-kilowatt (kW) rack was a real head-turner. Today, 120kW racks are routine for AI-enabled deployments. And if the roadmaps we are seeing are any indication, we’ll shortly be pushing racks that need 250kW, with the recently announced NVIDIA® Vera Rubin NVL576 projected to consume up to 600kW per rack. For some context, 600kW is enough to power hundreds of homes.
The heat is here—and it’s climbing fast.
The escalation of power requirements translates to enormous cooling challenges because conventional air cooling has its limits. It can’t keep up with these densities without extraordinary airflow, specialized containment, and huge operational overhead.
The Limits of Conventional Air Cooling
Air cooling dissipates heat by moving large volumes of air across critical components. This works decently at lower densities, but as racks pack more powerful GPUs and CPUs, you end up needing larger heatsinks, better air movement, and more cooling capacity. But it quickly becomes inefficient—energy consumption spikes as the cooling system works overtime, and there’s a practical ceiling on how much heat air can remove before servers start throttling or failing due to overheating.
Evaluating the Options: Direct-to-Chip (DTC) vs. Other Cooling Approaches
Realizing air cooling isn’t enough to support the evolution of AI/HPC racks, we began to explore multiple liquid cooling options. The most common approaches are:
- Rear-Door Heat Exchangers (RDHx) – A transitional solution that replaces standard rack doors with liquid-cooled heat exchangers.
- Direct-to-Chip (DTC) – Two different methods of circulating liquid directly to cold plates mounted on CPUs and GPUs:
- Single-Phase Liquid Cooling – The coolant stays in liquid form, absorbing heat and cycling through a heat exchanger before recirculating. It’s reliable, easy to maintain, and widely available, but it may struggle with extreme heat loads.
- Dual-Phase Liquid Cooling – The coolant evaporates into vapor, allowing for higher heat transfer efficiency, ideal for high-power density applications. This option is robust but comes with greater complexity, cost, and infrastructure demands.
- Immersion Cooling – Entire servers are submerged in a non-conductive fluid. It’s an even more complex option that requires a type of infrastructure most data centers can’t accommodate without major reconstruction.
Selecting the Best Option for Our Clients
Each option has its pros and cons, but we found Direct-to-Chip, Single-Phase to be the best fit for our clients’ needs based on several logical reasons:
- Market adoption – ODMs are integrating the technology upstream by installing cold plates and plumbing for single-phase direct-to-chip liquid cooling.
- Practicality – This approach is easier to service compared to immersion cooling, which requires specialized baths and fluid handling. In addition, it integrates well into existing data center layouts.
- Reliability – There is a lower risk of contamination or fluid handling issues compared to full-server immersion.
- Efficiency – Because heat is transferred directly from the component to the coolant, it significantly reduces fan and airflow requirements.
Why Build a Dedicated Liquid Cooled Rack Integration Facility?
At AHEAD, we have a history of integrating AI/HPC racks and, in fact, multiple NVIDIA DGX SuperPOD™ racks, including the first AI supercomputer to be used by a global bank. So we understand the evolving demands of high-performance computing. Early in the AI evolution we recognized that liquid cooling would be critical for the future.
Our strategy has always been to stay ahead of industry shifts, but a key inflection point came when we were approached by a client with a requirement for direct liquid-cooled racks capable of pushing 100kW, with scalability to 250kW or more. We saw this as validation of our vision—and an opportunity to invest in the infrastructure needed to support the next-generation of workloads.
Scaling our capabilities meant we needed to rethink data center design. Instead of merely adapting existing systems, we committed to hybrid air- and liquid-cooled rack integration that would meet today’s needs while future-proofing our ability to support the next wave of high-density computing.
Preview of What’s to Come
As we progress through this four-part blog series, here are the areas I’ll dive into:
- Facility Considerations – The often-overlooked challenges of ramping up infrastructure (power availability, chillers, floor space, lead times, etc.).
- Managing the Project – The team make up and how we got our arms around the project while also running the day-to-day business.
- Implementation Lessons – Real talk about the pitfalls we encountered as we rolled out a liquid cooling facility: training technicians, tackling unexpected issues, and more.
In these posts, I will break down the progressive nitty-gritty of planning, selecting the right facility, and making hard choices about power, space, and timelines. It wasn’t always pretty—but it’s been an interesting ride. If you have any questions you’d like answered or topics you’d like to see highlighted, reach out to me directly, I’m always happy to talk.
About the author
Chris Tucker
EVP, Foundry™
Chris Tucker is EVP of Foundry™, AHEAD’s facilities for integrated rack design, configuration, and deployment. Chris is passionate about helping companies identify and solve complex business issues with cutting-edge infrastructure products and services. Hailing from Wales, he is equally passionate about Welsh rugby.