Ten years ago in this very publication, I did something I never do. I went out on a limb and made a prediction. Actually, I went farther than that. I claimed to be “ahead of my time” and prognosticated with confidence that data centers would be adopting liquid cooling in the next few years and air cooling would be relegated to the junk heap of HVAC history.

Well … it’s a decade later. We’re still waiting.

So, what happened? Or more precisely, why hasn’t anything happened? The thermodynamic advantages of fluids over air are an undisputable matter of physics. On a volume basis, water carries about 3,500 times as much heat as air. And mineral oils, which are non-corrosive and non-conductive and into which servers can be directly immersed, can carry about 1,200 times more heat. So again, why aren’t we there yet?

For starters, engineers aren’t merely physicists. Yes, the laws apply, but we must go a step further and cobble together components in order to create systems that are actually useful, economical, and maintainable. Often we are limited to the products that already exist in the marketplace. And the endusers, with their unique set of experiences, risk tolerance, and definitions of success, must always be considered.

So we may know that fluids make sense, but applying them in a practical manner requires getting quite a few wet ducks in a row.

 

Obvious Obstacles

Perhaps the greatest hindrance to a fluid revolution is form factor. The hardware and infrastructure required to support direct fluid cooling is very different from an indirect air cooled configuration. Except for perhaps the mode of final heat rejection to the atmosphere (cooling towers, fluid coolers, etc.), the equipment inside the whitespace is wholly different.

And unlike adding aisle separation or containment, or converting a space from CRACs on the perimeter to an in-row configuration, there are limited incremental options. In most cases, you have to be pretty much all in. And frankly, converting an existing space from air to fluid is in most cases simply not cost-effective.

Another drawback has been risk aversion. Just as we don’t put toasters near bathtubs, comingling servers and water just seems downright dangerous. So most often, the very idea is a non-starter. Never mind that the super computers of the past relied on water cooling, it’s just not in the DNA of this generation of data center professionals to consider water as a viable option.

Also, if the server manufacturers aren’t fully supporting direct fluid cooling at scale, then the average data center isn’t going to, either. While there are systems where commercial servers can be directly immersed in fluid (like the CarnoJet System in Figure 1), it takes some real nerve to make that leap.

A less obvious reason that fluid cooling hasn’t been widely deployed is that the allowable and recommended server operating temperature ranges were expanded in the last 10 years. By allowing temperatures to rise, the window for air cooling was opened a little further.

Combined with the evolution of aisle separation and containment, higher air temperatures and wider delta-T’s across the servers, we have been able to handle more heat via indirect air cooling without dramatically increasing energy use. So there has been no need to take the plunge … or rock the boat … or whatever fluid-related metaphor you want to insert.

 

Load Density

However, there are two drivers out there that are still pushing us to fluid-based solutions. One is increased load density in the rack.

Now, this is an old chestnut we have been hearing for years, that load density will be pushing 20 to 30 kW per cabinet any time now. But that prediction hasn’t come to fruition in most data centers. In fact, the average load density in most legacy data centers is still in the 3 to 5 kW range.

Part of the answer is that we have done more computing with fewer servers. So even though an existing center may be doing 10 times the calculations they were a few years ago, they are doing it with fewer, more efficient servers. Fewer servers take up less space. So even though the individual server heat load may be higher, you can spread the servers across more cabinets because your room’s footprint never changed.

But refreshing a data center over the long haul is not a zero-sum game. We just keep demanding more computing. Think the Internet of Things.

In part, this is due to competing laws in the IT universe. Page’s Law (named for Google co-founder Larry Page) states that software gets twice as slow about every 18 months. Moore’s Law says computers double in speed every year or two. Since Page outruns Moore, computers have to keep evolving. Which partially explains why your computer seems to get slower as it ages, even though the hardware inside remains unchanged.

So server speeds increase, but we continue to require more of them. More, warmer servers will eventually fill the space you have. Or you are moved to a smaller space because the CIO noticed your cabinets weren’t fully loaded. Either way, your cabinet density will be increasing.

Slowly, perhaps. But surely.

 

Toasters and Hot Chips

So you follow so far? Thinking of servers like toasters, if you put more in a cabinet you can see the cabinet will get hotter. Put enough toasters in the cabinet and air cooling just starts to look ridiculous. And as facility engineers we can see this problem coming. We live in a macro world and address problems at a human scale. Even if we are blinded by an air cooled paradigm, the benefits of fluid cooling begin to dawn on us once we start to understand the problem.

But what about inside the toaster?

Turn the toaster back into a server and consider the heat source. Most of that heat is coming off the processor.

The thermal design power (TDP), sometimes called thermal design point, is the maximum amount of heat generated by a computer chip or component that the cooling system in a computer is designed to dissipate in typical operation. The latest high end CPUs are running at around 100 watts.

Stop for a second and consider that a CMOS chip that can fit in the palm of your hand has a smokin’ heat flux of about 10 W/cm2. If we convert to the common power density metric used in data center design, that works out to 9,000 W/ft2.

Now using W/ft2 is a lousy metric because it’s about scale. But you can start to fathom why liquid cooling at the chip level starts to make sense.

While the data in Figure 2 is a bit stale, it shows the historical and exponential increase in processor heat flux over the decades1. What the chart doesn’t show is that the rate of rise since 2005 has decreased due to changes in chip architecture and a focus on efficiency. But the slope is still upward.

From this you can conclude that cooling of processors will have to become more efficient — i.e., fluid cooling and more advanced heatsink technologies — until we either hit a physical or thermal limit, or we experience the next breakthrough in processor technology.

Either way, in the near term the potential for fluid at the server level is on the rise.

 

Coupling Closer

Up until now we have discussed fluid cooling in broad terms: From water to oil, and immersion to heat exchangers. But we need to narrow the focus if we want this discussion to be more than an intellectual exercise.

The reality at the chip level is relevant and a harbinger of things that may come, but most of us don’t build servers for a living. And the fascinating world of immersion and direct cooling of servers (like the clamshell in Figure 3) is literally cool, but right now, the design of an installation using these technologies would be a one-off for most of us.

So let’s focus on an existing technology that is a step away from wholesale indirect air cooling (CRAC-to-room-to-rack-to-room-to-CRAC) and a step closer to the load itself.

Let’s talk about rack-based cooling.

Rack-based cooling is still indirect air cooling, but at least the process is more closely coupled to the load. Rack or door mounted cooling systems do not fit the true definition of fluid cooling in that they still rely on air across the servers. But relatively speaking, fluid is now closer to the rack, so we can reference it as a sign of the rising tide.

Conventional wisdom states that a well-planned hot-aisle-cold-aisle configuration can comfortably handle up to 5 KW per cabinet. Add separation strategies or containment, and you can get to the mid-teens. But a cabinet based solution like ChilledDoor (Figure 4) gets you to 30 kW and beyond.

Going with rack-based cooling can offer other advantages beyond high density. The reduction in the airflow path length reduces the fan power required, increasing efficiency. This can be advantageous when you consider that in many lightly loaded data centers, the CRAC fan power losses can often exceed the total IT power consumption. 

Another drawback has been risk aversion. Just as we don’t put toasters near bathtubs, comingling servers and water just seems downright dangerous. So most often, the very idea is a non-starter. Never mind that the super computers of the past relied on water cooling, it’s just not in the DNA of this generation of data center professionals to consider water as a viable option.
 

The chilled water piping plant mains and distribution headers should be right-sized (not oversized) to accommodate future load increases. Even though this may represent a higher first cost, in the near term when loads are smaller (and arguably over the course of most of the data centers life), this will save energy. Larger pipe means less pressure drop, so pumps may be smaller.

 rack-based design also allows cooling capacity and redundancy to be targeted to the actual needs of the specific rack. And, redundancy can be targeted to specific racks. Compare this to traditional room-based cooling which only allows these characteristics to be specified at a higher level, which means higher first costs.

 

A Future Proof(ish) Hybrid Design

A good data center design should be:

  • Scalable. The solution must be appropriate for day one when the loads may be small, but able to grow with minimal disruption.

  • Reliable. The system must be as simple as possible (but no simpler), dependable, and preferably built on existing technologies and practices.

  • Non-proprietary. The design should not lock one into a particular manufacturer or technology.

  • Energy efficient. Any solution in this modern era must be environmentally responsible and fiscally sustainable.

 

The overall solution need not be solely fluid- or rack-based. Handling a base load with CRACs may make sense. You have to determine with your client what the safe base load is in your particular application. It will be a function of space geometry, load concentration and location, risk tolerance, and anticipated growth (both how much and how fast). Whatever you determine your base load to be, give yourself some mathematical elbowroom, and then layout your CRACs based on well-established design practice.

If we return to those four tenants of an ideal infrastructure (scalable, reliable, non-proprietary, and energy efficient) and let them inform our design, a relatively straightforward data center piping rubric emerges.

Regarding scalability, the chilled water piping plant mains and distribution headers should be right-sized (not oversized) to accommodate future load increases. Even though this may represent a higher first cost, in the near term when loads are smaller (and arguably over the course of most of the data centers life), this will save energy. Larger pipe means less pressure drop, so pumps may be smaller.

While the plant and primary mains have to support the average load on the raised floor, at the data center the distribution network must be sized to adequately support concentrated loads. Since these hot spot locations may change over time, the network must be flexible.

A simple example (but not a rule of thumb): If the average load is 100 W/sq-ft, the local distribution should be able to handle as much as 175 to 200 W/sq-ft locally. The actual “local load factor” you apply must be determined in the course of your design based on your particular conditions.

And the last component regarding scalability and flexibility is the provision of isolation valves, taps for future connections, and the locations of both. Specifically, one should be able to connect to future equipment, or accommodate the relocation of existing equipment without disrupting the operation of the data center.

Based on your client’s risk tolerance, establishing the proximity of piping taps to the possible hot zones may be as aggressive as running a main down every aisle. Or you may establish an acceptable distance (50 ft, for example), which can be used to create radial zones emanating from each tap, in turn providing full coverage with the understanding that some interruption may have to be accommodated.

Valves figure into the issue of reliability as well, in that sufficient valving needs to be provided to allow partial loop isolation while allowing continued operation. And on the subject of loops, in a data center we are talking about a true looped piping system where there should be no single point of failure. In the latest ASHRAE Liquid Cooling Guidelines (see the bibliography) multiple concepts are presented, from simplest to most complex, with their associated advantages and disadvantages listed.

On the subject of being non-proprietary, a chilled water system is just that. Like I said, any “thing” can plug into your chilled water system, and those technologies may be proprietary if the client prefers, but your infrastructure is oblivious to that specialty. A valve is a valve and a tap is a tap.

 

In Conclusion

Futurecasting is a dicey business. My generation was told that technology would lead us to colonies on the moon in our lifetimes. But instead it has given us cat videos and the Kardashians on Instagram.

So while fluid cooling has been talked about an awful lot, and the breakthroughs predicted by many in the industry still haven’t arrived, systems continue to be developed and chips are getting hotter. So it seems logical that a tipping point will eventually arrive.

On the other hand, there could be a breakthrough at the micro-level and just as CMOS replaced bipolar, a new computing technology may emerge which will reset the cooling clock.

Regardless of what the future may hold, as a designer you should be aware of the possibilities and design accordingly today. ES

 

Bibliography

ASHRAE. 2015. Thermal Guidelines for Data Processing Environments. Atlanta: American Society of Heating, Refrigerating and Air-Conditioning Engineers.

ASHRAE. 2012. Datacom Equipment Power Trends and Cooling Applications. Atlanta: American Society of Heating, Refrigerating and Air-Conditioning Engineers.

ASHRAE. 2013. Liquid Cooling Guidelines for Datacom Equipment Centers. Atlanta: American Society of Heating, Refrigerating and Air-Conditioning Engineers.

 

References

 1 Chu, R.C.; Simons, R.E.; Ellsworth, M.J.; Schmidt, R.R.; Cozzolino, V., “Review of cooling technologies for computer products,” Device and Materials Reliability, IEEE Transactions on , vol.4, no.4, pp.568,585, Dec. 2004.