At approximately 7:40am on July 19, two fires broke out at MG&E power substations in downtown Madison. The forecast was calling for near record-breaking heat and humidity. To allow firefighters to safely put out the fires, MG&E cut the power to the entire isthmus at about 8:10 a.m.
The scale of the power outage was unprecedented. For many businesses located in the affected area, it meant thousands of dollars in lost productivity and/or an inability to serve customers.
But for businesses that rely on 5NINES core data services, including co-location, cloud infrastructure, web hosting, managed services, and fiber internet, it was business as usual — thanks to 5NINES’ well-rehearsed crisis response plan and major investments in business continuity.
Uninterruptible power systems: an insurance policy against the worst case scenario
In the almost two decades that 5NINES has housed its data center in the Network222 building, the company has never witnessed a complete power outage in the building. That’s not just good luck; the building is connected to a redundant power grid that also serves the state and county offices, capitol and the square. This makes it a strategic location for a data center because the chances of a commercial power outage are slim.
5NINES is committed to keeping our customers online no matter what. So we have made a series of major investments in our backup power systems, including an Uninterruptible Power Source (UPS) that provides instantaneous backup power from batteries and a diesel generator to power the data center for longer outages. Our data center customers vary from small e-commerce companies who rely on our web hosting services to process sales, large multinational companies, and state entities. If the data center went down, our customers could lose sales, and Wisconsin lawmakers couldn’t function.
Crisis response plan in action
The July power outage was the first true test of 5NINES’ backup power system. While we run quarterly tests on our generator and practice our crisis response plan frequently, those tests never replicate the full electrical load of a true outage.
When MG&E cut the power, the 5NINES data center switched effortlessly from commercial power to its diesel generator, which powers both the data center and HVAC systems in the colocation facility. 5NINES activated its crisis response plan — which notifies all employees via multiple communication channels — and began monitoring the data center to make sure everything was running correctly on backup power.
While the core data center was completely unaffected by the outage, we realized within eight minutes of the power cut that customers receiving Internet from our radios on the rooftop of Network222 were without service. This included customers who were outside the power outage. Our team immediately diagnosed the problem: The penthouse of the Network 222 building — where the radios are located — was not wired to the generator that was powering the data center, and the UPS system that had been installed to serve the penthouse was no longer sufficient to handle the electrical load.
We made an emergency purchase of a 2200-watt Generac generator and got those services back online by 10 a.m.
Once our systems were back online, we focused on helping our clients and other internet service providers in the building deal with the aftermath of the sudden power cut. We brought in a fan to help cool a room that was overheating. We fielded many calls from customers who needed help getting their servers back online after power was restored in their building, and we walked them through the reboot process.
The power outage kept our team busy all day. We answered more than 100 phone calls and responded to and closed more than 1,500 service tickets in one day — a deluge compared to our daily average of 150 tickets.
Hours of preparation and millions in investment pays off
We believe our dedication to continuity and disaster planning are the reasons we made it through this crisis with almost no impact to our customers. In addition to our investments in power systems, we also credit our first-class response to the following:
- Great communication during the crisis. Following the guidelines in our crisis plan, we opened a communication bridge through our Voice-over-IP system to allow all company leaders to stay on a conference call all day to make decisions together, no matter where they were.
- Remote office technology. When the outage happened, some team members were in the office and others were working from home, or en route to the office. At 8:30 a.m., our office building was evacuated and locked down for safety, so we were unable to get into our office. We seamlessly transitioned to working remotely using the same virtual office services we provide to our customers. While other businesses downtown sent their employees home to take the day off, our employees who were not directly involved with the data center and penthouse at Network222 were able to continue their work uninterrupted.
- All-hands-on-deck attitude. Our team was ready to do whatever it took to keep our systems running and get the radios running again as soon as possible. That included hauling a generator and fuel up six flights of stairs because the power outage impacted the elevator system.
Perfecting the plan
We’re extremely proud of our response, but we know we can’t simply pat ourselves on the back. After every crisis, we debrief our team on what went well and what can be improved. We are now even more prepared for a power outage of similar or even greater scale, with a generator big enough to handle the electrical load from the radios in the penthouse.
We will also continue to refine our crisis plan with the following actions:
- Connect the penthouse to the diesel generator. Running a power line from the main generator to the penthouse has been part of our strategic plan, but complications with the building forced us to put the project on hold. It’s now back on our priority list.
- Eliminate redundant help tickets. While our alert system did a great job of grabbing our attention, it produced a lot of redundant tickets — 1,500 tickets for about 30 real issues. We’re going to refine the alert system to cut down the noise.
- Guarantee fuel supply based on consumption rate during outage. Since we had never run the generator with the full electrical load of the data center, we didn’t know the exact fuel consumption rate. We were monitoring the fuel consumption and were ready to do whatever it took to keep it supplied with fuel. But now that we have the data about fuel consumption, we can iron out our contract with our fuel supplier to get a guaranteed fuel delivery schedule to make sure that we won’t run out of fuel in a future emergency.
It can be difficult to prioritize investing in business continuity. Generators and UPS systems are expensive to install and maintain, and crisis response plans take time and effort to rehearse and refine. But our customers rely on us for 24/7 access to their data, no matter what, and we take that seriously. We’re proud of every investment we’ve made, and plan to continue to do what it takes to guarantee that our customers never experience an outage. If business continuity is important to you, then let’s schedule a chat about our colocation and web hosting services.