After 20 years in data centers, you might think I hang out on the bleeding edge with liquid-cooled chips, servers immersed in mineral oil, and computers singing “Daisy Bell” in three-part harmony. But the reality is I still spend most of my time in spaces that are struggling with the basics of air management and proper temperature control.
It’s not because my clients are backwards. On the contrary, I am fortunate enough to work with some of the largest and smartest government agencies, colo providers, and internet service providers (ISPs) in the states and around the globe. Instead, these otherwise forward-thinking operators have existing brick and mortar points of presence (PoPs) that simply have to be accommodated due to financial, geographic, and sundry other restraints.
For example, a certain tax collection agency has two major data centers with the same challenges all older enterprise data centers experience. And while a new facility might be ideal, a group of 535 pontificators have to be wrangled in order to approve such an expenditure. But in a swamp-like environment, funding doesn’t come easy.
But before we dive in deeper, let’s set a baseline for our design mindset.
- Recognize the recommended
- Advance the allowed
- Separate the streams
The “recommended” are the ASHRAE Recommended Thermal Guidelinesi. The recommended ranges should not be a matter of debate any longer. The “allowed” are the limits set by the ASHRAE Allowed Thermal Guidelines. We should strive to approach these boundaries sooner instead of later.
And the “streams” we must separate are those in and out of the IT rack. And the only way to absolutely guarantee that air streams don’t mix is to provide physical demarcation, be it curtains, walls, chimneys, or other … pick your poison and employ it.
Now back to our regularly scheduled programming.
When ASHRAE issued the revised environmental standards in 2008 — and then bumped them again in 2011 — those of us in the mission critical HVAC business expected IT spaces to start warming up. But while the Web 2.0 crowd has moved closer to the metaphorical equator, in general the enterprise data center of 2017 still feels pretty much like it did in 2007.
In part, this disparity can be attributed to the difference in their business priorities. The business of the ISPs and the “Keepers of the Cloud” is data center-centric. So their bottom line gets boosted when they aggressively standardize, virtualize, and optimize the IT environment and what is within. Simply put, if job one is to run the most efficient data center, then you exercise every option in your arsenal, and that includes pushing the environmental envelope.
But if your enterprise is something else — like manufacturing widgets, trading securities, or monitoring ISIS — then your data center plays a support role within the enterprise, and it therefore receives only the requisite attention it deserves. It’s just the price you pay when you are not sitting at the head of the table.
However, there are things you can do with OpEx that don’t require CapEx, which we will discuss later. And regardless of your organizations top priorities, if you are a data center operator (or an engineer supporting one) then your priority is to optimize the operation. A penny saved is a penny earned, and the same goes for kilowatts.
So why are too many data centers still too cool? Perhaps the primary obstacle to raising temperatures is more corporal than corporate. Specifically, maybe the reason data centers aren’t uncomfortable is because people are stationed within data centers and they expect to be comfortable. Put another way, too many people are taking up residence within spaces that should be exclusively set aside for equipment.
Within the last decade, I was fortunate enough to visit the data center located deep within the bowels of the Vehicle Assembly Building at the Kennedy Space Center (that’s the big one with the U.S. flag and NASA logo painted on its side). The machines and people in that room had guided the space shuttle program successfully for decades. But there was no common cabinetry, no air management of any kind, and the number of rolling chairs in the space was rivaled only by the number of black rotary dial phones.
While this may have been understandable for an aging program weeks away from its last mission, it shouldn’t be the case in today’s data center. But too often, it is. Whether it’s habit, or laziness or poor space management, too many people are still working in the wrong place at the wrong time … all the time.
There is no good reason to still be residing within the confines of an IT space. Most of the time, a data center should be dark and warm — feeling more like a warehouse than a surgical suite. Air management should be driven by the needs of the equipment, not maintaining a tech’s oasis somewhere out on the raised floor.
But what makes this problem an opportunity is that it can be solved almost unilaterally. Most supervisors can manage where their folks are stationed without having to go to corporate. And of all industries, you would think technology practitioners would be the most adept at implementing work-from-home strategies, and reimagined office paradigms utilizing hoteling, hot desks, and the like.
Note that once a permanent human presence is eliminated, the energy saving possibilities cascade.
Lighting policies can be implemented immediately. Since egress and life safety requirements are accounted for in the original lighting design, lights that have to stay on will stay on. This means you can simply flip switches and still maintain a safe environment. Motion sensors can be installed … or simply post a sign that says TURN OFF THE LIGHTS. Heck, you can even laminate it.
A bit trickier, but no less doable, is to start a temperature and humidity reset program where ambient conditions are adjusted and monitored. Adjustments continue until hot spots or issues arise, at which time you back down a step or two, corrections are made, and the process restarts until it plateaus. It’s your call on how you implement and what is acceptable, but the key is all stakeholders need to be advised and invested in the effort. Nothing poops a party faster than an application engineer with a blinking red light who wasn’t invited to the thermostat reset fiesta.
One of the best resources for concrete steps that can be taken by an operator to use is the Lawrence Berkeley National Laboratory’s (LBNL’s) Center of Expertise for Energy Efficiency in Data Centers (CoE) (https://datacenters.lbl.gov/). What is so great about the LBNL CoE is that so much of what they advocate and provide guidance for is doable … arguably unilaterally. For example, who do you have to ask permission from if you need to move perforated tiles around in order to address cooling loads more efficiently? It’s doubtful anyone would notice, let alone care.
One of the best tools that can be found at the website is the Master List of Energy Efficiency Actionsii. The Master List is a living document of best practice actions aimed at data center owners, operators, engineers, and energy assessors. This straightforward document provides actionable guidance to both prioritize and implement energy saving measures in data centers.
The Master List is divided into eight sections that represent data center subsystems and other areas that deserve attention:
- Energy monitoring and controls
- IT equipment
- Environmental conditions
- Cooling air and air management
- Cooling plant
- IT power distribution chain
Note that as mechanical engineers we own fully half of the areas addressed (in bold) and because lighting and the global components are intuitive (e.g., efficient motors, lamping, and controls), we can drive the energy optimization locomotive.
Because ES is not a transcription service, we won’t list all of the steps that can be taken, but we can look at an incremental approach using many of the recommendations.
First, establish airflow discipline to the maximum extent possible. Plug holes in the floor, use blank-offs in the racks, provide only as many perforated tiles as you need and place them where the load is. If you can introduce a return path via a plenum, you will see improvements. And if you can add real separation, you are more than halfway to your target.
HVAC energy consumed in the average data center is split about 50/50 between the fan and your heat rejection method (DX, chiller, etc.). DX applications have issues with reduced airflow, so varying airflow and in turn fan energy isn’t usually an option. However, supply air reset strategies can be applied pretty much across the board. So if you only have one strategy you can employ due to lack of VSDs, cash, or nerve, go the variable temperature route.
Move your temperature sensors to the inlet of the racks. Utilizing feedback from the sensors at the racks, reset the CRAC supply temperatures upward to keep the most demanding rack satisfied. In smaller data centers, you would control all CRACs in unison, but in larger installations, you may employ local temperature zones controlling multiple units.
And what temperature should you control to? Well, even though ASHRAE and the vendors say you can go as high as 95ºF, there is a significant body of research that indicates that around 80ºF you see diminishing energy savings as server fans begin to ramp upiii. This makes a great case for real-time power monitoring of the IT gear to find that sweet spot, but I digress.
With just the temperature reset strategy, your compressorized equipment is running less, and when it is operating, it is running more efficiently. Plus, you have opened the economizer windows, be they air or water. But if your coils can handle the variation, you can double down with a VAV strategy using the same simple rack control point.
Specifically, using that slow-acting control, reset the CRAC fan speed and supply temperature by highest rack temperature. Obviously this can’t be done simultaneously, so a stepped strategy is recommended.
On start-up, the fan shall run at pre-set maximum speed and minimum LAT.
If temperature at all racks is below setpoint, then increase the LAT incrementally until it reaches the predetermined maximum.
If the temperature at all racks remains below setpoint and the LAT is at its maximum, then reduce the fan speed incrementally until it reaches the predetermined minimum.
If the temperature at the racks exceeds setpoint, the sequence shall reverse.
The setpoints for rack temperature, LAT, and fan speed shall all be estimated during design but determined during commissioning. Once the system is up and running, the operators will have the opportunity to optimize the setpoints to realize the most energy savings possible. Also, there may actually be two maximum fan speeds: one for normal operation when multiple units are operating, and one for during an event when a CRAC may be off-line.
The basics will always matter. Get folks out of the data center, establish airflow management, and optimize airflows and delta Ts. Sense and control based on where it matters most: at the rack, not at the return.
Use the resources out there. ASHRAE’s datacom series and the LBNL tools and guides exist for a reason. There are literally hundreds of steps that can be taken one at a time, but you have to take the first one.
We all want to be ahead of the curve, but sometimes we have an assignment well behind that vantage point. Many data centers are boring outposts housed in buildings built before the millennial generation was born, with systems to match. But regardless of where we find ourselves, we can always work to improve the situation.
Seize the opportunity behind the curve. You may just be surprised how much closer to that curve you actually get.
i ASHRAE, 2011. 2011 Thermal Guidelines for Data Processing Environments – Expanded Data Center Classes and Usage Guidance. Developed by ASHRAE Technical Committee 9.9.
ii LBNL 2016. Data Center master List of Energy Efficiency Actions. Developed by the LBNL Center of Expertise for Energy Efficiency in Data Centers.
iii Moss, D., 2011. “A Dell Technical White Paper - Data Center Operating Temperature: The Sweet Spot.” Dell Incorporated.