FPEeXTRA Issue 91

Challenges to Incident Response at Secured Data Centers

By Adam Barowy and Jasen Dodsen

View full PDF here

The advancement and increasing affordability of lithium-ion (li-ion) battery technology is revolutionizing every product sector from personal electronic devices to vehicles, and more recently, data centers. 

Data centers must operate their computing equipment with maximal uptime. Since power from the electrical grid cannot offer 100 percent uptime, data centers have implemented batteries to provide uninterruptable power to data center equipment. An analysis by Frost and Sullivan found that li-ion batteries accounted for 15 percent of the data center battery market in 2020 and that growth to 39 percent of the market is expected by 2025. 

Li-ion batteries are replacing legacy lead-acid batteries because they have a much higher energy density (i.e., a smaller footprint), longer cycle life, require less maintenance, and offer improved modularity among other additional benefits. All these advantages translate to the most important practical consideration for data centers: increased uptime. 

However, lithium-ion battery advantages do not come without introducing new challenges to data centers. Specifically, li-ion cells can experience a phenomenon called thermal runaway, which generates heat faster than the battery can dissipate it. Thermal runaway happens quickly and typically results in the cell catching on fire. Electrical, mechanical, and thermal stresses can cause a thermal runaway (e.g., overcharging, puncture, or heat). Subsequently, the heat of a thermal runaway and fire can cascade quickly, inducing thermal runaway in other cells in a battery pack—which can also exponentially increase the scale of the hazard. 

Traditional data center fire protection systems do not address propagation of thermal runaway inside a battery pack. Extinguishing agents cannot penetrate inside the enclosures that contain li-ion cells to remove the heat and control flaming. Complicating matters further, modular battery placement throughout the data center’s operations means that li-ion incidents can occur across the expanse of a data center, instead of in a centralized battery room. This typically necessitates a fire department response to mitigate further damage and ensure final extinguishment. While the rate of individual li-ion cell failures is typically quoted as rates of “1 in 1 million” to “1 in 10 million,” data centers have become so large that one data center may contain millions of li-ion cells. Fire safety standards, building codes, and the safety of li-ion batteries are improving, but fire department responses to data center battery incidents are likely to increase in frequency until the hazards can be engineered out.

Minimizing incident response time is crucial to ensure a successful outcome. If batteries are affecting the frequency—and potentially the severity—of data center fire incidents, increased data center security during the last decade has introduced new impediments to rapid and effective incident response.  The recent increased security and surveillance measures can hamper emergency or hazmat response teams from gaining access to the areas on fire.

Modern access control systems are critical for site security, but they have also created challenges for the fire service. Impediments include physical access controls, vehicular access controls, identification and search policies, facility size, layout, and markings, and, consequently, the reliance on single data center staff individuals who can aid the fire service in navigating all of these impediments. 

Physical access control systems now use smart hardware, which replaced legacy proximity controls that provided simple and rapid access for incident response. Single-vehicle entry access gates (Figure 1) bottleneck responding fire apparatus and may not accommodate longer apparatus (e.g., ladder trucks). Security policy can further slow response time by requiring checking and documentation of apparatus license plates and crew member IDs. In extreme cases, security has inspected the underside of the vehicles before allowing entry.

Figure 1. Typical Security Entrance Gate of a Secure Facility

Once the first responders have navigated exterior physical security controls, first responders must navigate immense and maze-like facilities with further physical security controls. Typically, security also precludes maps, and markings often do not indicate either hazards (e.g., battery locations) or positions within a building. Consequently, responders rely on building staff to aid with navigation. This is helpful but can become an impediment if fewer staff members are available on a given shift or staff members are responsible for several buildings. The problem is further complicated if smoke exposure incapacitates staff members and emergency responders investigating an incident.

The most effective strategy starts before the incident through a collaborative planning process. Here are five actions the security industry can pursue with their local fire service to improve incident outcomes (e.g., reduced downtime):

1.      Pre-planning. Ongoing pre-planning helps to identify (1) hydrant locations, (2) fire department connections (FDC) locations, (3) fire protection systems, and (4) building systems. The fire service understands there is proprietary information involved, but there are opportunities to balance pertinent safety information and the maintenance of facility security.

2.      Markings. Critical locations such as the main entrance gate should be marked. There should also be signage above every door (Figure 2)—both inside and outside—and signage to identify specific hazard areas to aid first responders with navigating the site.

3.      Accountability. Hi-visibility ID vests can assist with identifying who is the point of contact for the site.  Green lights on security vehicles help to identify the location of data center staff.

4.      Unified command. Integration of data center staff members into the unified command aids the Incident Commander (IC) with tactical considerations and finalizing the incident action plan (IAP). The IC seeks to account for all employees and determine if any are injured. The IC also seeks to verify where the fire alarm is sounding, and what fire suppression systems, gas detection and building systems are onsite and if they have activated.

5.      Building camera access. Interior video feeds are critical for incident size-up. Having the ability to see if the battery modules in thermal runaway can help firefighters determine where and when to enter the building. It also allows the building engineer to provide directions for the interior crews without having to enter an immediately dangerous to life or health (IDLH) atmosphere. 

Figure 2. Entrance to the Loudoun County Combined Fire and Rescue System

Due to the challenges outlined in this article, the Loudoun County Combined Fire and Rescue System (LCCFRS) has developed a data center emergency response manual. Loudoun County is considered the “Data Center Capital of the World” and is home to more than 200 hundred data centers operated by the world’s largest data center companies. 

LCCFRS is leading an initiative to develop emergency response considerations and guides for data centers for first responders. LCCFRS is currently working directly with the data center coalition and large data center companies to better understand these complex facilities, including proprietary building systems and security measures. In addition, LCCFRS is also working with the lead safety professionals at these facilities to provide their employees with an interactive fire safety plan. The goal is to ensure that everyone goes home. 

First printed in the June 2023 edition of Security Technology.

Adam Barowy is a research engineer at the UL Fire Safety Research Institute (FSRI) where he researches fire and explosion hazards from lithium-ion batteries. Barowy works with the fire service to improve situational awareness and develop effective strategies for mitigating hazards at lithium-ion incidents. He has developed UL Standards and Test Methods, and evaluated emerging fire hazards, and new and innovative fire protection solutions. Barowy advanced UL’s capabilities for evaluating the fire safety of energy storage technologies and led UL's large scale battery fire testing program. His career started at NIST where he conducted burns of acquired structures to enable improved firefighting tactics, reconstructed LODD/injury fire incidents and evaluated firefighting equipment. Barowy holds an MS in Fire Protection Engineering from WPI and B.S. in Mechanical Engineering from UMass Amherst.

Jasen Dodson is the emerging technologies battalion chief for Loudoun County Fire and Rescue and is currently assigned to the Data Center Alley of Loudoun County.  Dodson has served the department for 20 years. He has been a hazmat technician for 19 years, served as the safety officer for LCFR and the National Medical Response Team DC-1 (State of the Unions and Presidential Inaugurations deployments). Dodson was the author of the Loudoun County Fire and Rescue Fire Station Design Manual and is currently the chairman of the Northern Virginia Emergency Response System (NVERS) Emerging Technologies Work Group and sits on the Council of Governments (COG) ESS/EV Committee. The NVERS Emerging Technologies Work Group is currently reviewing the Data Center Response Manual, researching, and developing training programs for data centers, energy storage systems, electric vehicles, photovoltaic, wind, alternative energy sources (hydrogen), and hazard mitigation.