“Five Nines” and Infrared (IR) Testing at Data Centers
By Gregory R. Stockton
99.999% uptime…five nines. That is what IT (information technology) customers are looking for. Uptime or “availability” at data centers is an absolutely necessity. A loss in power to a data center can cost the owner millions, literally. The power, cooling and support systems are vital to the continuous flow of information in these “mission critical” facilities. IR/PM (infrared predictive maintenance) is a must. The electrical switchgear, UPS (uninterruptible power supply), ATS (automatic transfer switches), server systems and cooling systems must be checked with infrared thermography and other testing means on a regular basis to insure super-high reliability.
Mission critical facilities are like other facilities in that they have electro-mechanical equipment that must be maintained. The difference is that the operators of mission critical facilitiesowing to the extremely high availability requirements from managementhave to pay much more attention to the equipment so that it will not fail. This requires dual-path power supply systems (for redundancy) and regular testing of the systems.
* Dual-power technology requires two completely independent electrical systems tied together with switchgear. When the normal source of power fails, these dual-path power supply systems quickly switch to a back-up source. A UPS system keeps the power flowing until the normal source is restored or another source is brought on-line and synchronized. Usually, the UPS, through a PDU or power distribution unit (see figure 1, 2, 3), takes AC power, converts it to DC where a bank of batteries is tied in and then inverts it back to AC to feed the computer hardware. Since the systems often cannot be tested on-line, they must be tested during “maintenance windows”, planned outages or times when the impact of testing is low, so that simulations can be run. By pulling power from a load bank, resistive load testing is used to fully simulate and test all equipment on the floor. Any problems that are encountered during an infrared survey are repaired immediately and the system is rechecked before putting the equipment back on-line.
Figure 1 – Typical PDU in a data center with load bank test being run.
Figure 2 – SCR connection on an inverter assembly at over 550º F.
Figure 3 – Bolted/crimped connector on an output filter.
* Battery back-up systems (see figure 4) must be checked in a real-time battery discharge situation to fully simulate an actual loss of the normal source of power. The batteries, connections, cables, switches and charging systems are checked for unwanted heating conditions.
* Uniform cooling of all data center server, storage, and computer equipment is essential for proper operation. The design objective of the cooling system is to provide a clear path from the source of the cooled air to the equipment and back to the cooling unit. This issue has received much attention lately as miniaturization of the equipment and economic pressures have increased the amount of heat that is generated per cubic foot of floor space and per cubic foot of rack space in the server rack panels. This hardware is sensitive to heat and humidity and some new designs are being tested so that failures do not occur solely due to environmental conditions (see figure 5). How perfect an application for IR!
* Utility main power supplies are typically owned by the local power company but are sometimes owned by the user. A looped system feeds power from two different power company substations and can be “back fed” if the power is out on the primary. No matter who the technical owner of the utility equipment is, it must be checked with IR like all other components. (See figure 6).
* Mechanical Systems have the same stringent requirements as the electrical system. Again, this is achieved by redundancy and failure prevention engineering.
There must be a total accountability of all infrared survey results, especially all of the equipment associated with the UPS, computer and server systems. This can be accomplished by recording the entire survey on digital videotape and/or capturing fully-radiometric images of all equipment, whether problems exist or not. In either case, a data log of all equipment surveyed must be created including a time/date stamp reference for all equipment. Documentation is very important.
Figure 4 – Small battery bank with a loose lug connection on the main breaker.
Figure 5 – Server rack designs being tested for heat dissipation.
Figure 6 – Pad-mounted transformer with loose connection on line side.
To achieve five nines availability, it is essential that competent IR testing be performed on all electrical and mechanical systems in conjunction with other testing and in cooperation with management and maintenance personnel.
If you maintain an office building, manufacturing facility or any other type of facility where uptime is important, you should take time to follow what is happening with data centers, as they are among the most mission critical of all operations.
Gregory R. Stockton is president of Stockton Infrared Thermographic Services, Inc.™ Based in Randleman, NC; the corporation operates six applications-specific divisions. Greg has been a practicing infrared thermographer since 1989. He is a Certified Infrared Thermographer with twenty-six years experience in the construction industry, specializing in maintenance and energy-related technologies. Mr. Stockton has published eleven technical papers on the subject of infrared thermography and written numerous articles about applications for infrared thermography in trade publications. He is a member of the Program Committee of SPIE (Society of Photo-Optical Instrumentation Engineers) Thermosense and Chairman of the Buildings & Infrastructures Session at the Defense and Security Symposium.
Copyright © November 2005