Читать книгу Maintaining Mission Critical Systems in a 24/7 Environment - Peter M. Curtis - Страница 45
3.1 Introduction
ОглавлениеBusinesses that are motivated to plug into the Information Age require reliability and flexibility regardless of whether the companies are large Fortune 500 corporations or small companies serving global customers. This is the reality of conducting business today. Whatever type of business you are in, many organizations have realized that a 24/7 operation is imperative. An hour of downtime can wreak havoc on project schedules or loss of critical information, resulting in lost hours re‐keying electronic data, not to mention the potential for losing millions of dollars.
Twenty‐five years ago, the facilities manager (FM) was responsible for the integrity of the building. As long as the electrical equipment worked 95% of the time, the FM was doing a good job. When there was a problem with downtime, it was usually a computer fault. As technology improved on both the hardware and software fronts, information technology began to design their hardware and software systems with redundancy, including dual corded equipment (either an A or a B power source can fully carry the IR equipment load). As a result of IT’s efforts, computer systems have become so reliable that they’re only down during scheduled upgrades.
Today the major reasons for downtime are human‐error or utility failures: poor power quality, power distribution failures, incorrect switching of equipment or accidental EPO initiation, and environmental system failures (although that percentage remains small). When a problem does occur, the facilities manager is usually the one in the hot seat. Problems are not limited just to power quality; but also, that the staff has not been properly trained in certain situations. Further complicating matters, recruiting qualified inside staff and outside consultants can be difficult, as facilities management, protection equipment manufacturers, and consulting firms are all competing for the same talent pool to support the mission critical industry. The stark increase in data center construction around the world has only exasperated the situation.
Minimizing unplanned downtime reduces risk, but unfortunately, the most common approach is reactive. That is, spending time and resources to repair a faulty piece of equipment after it has failed. Strategic planning can identify internal risks and provide a prioritized plan for reliability improvements. Also, only when both ends fully understand the potential risk of outages, including recovery time, can they fund and implement an effective plan. Because the costs associated with reliability enhancement are significant, sound decisions can only be made by quantifying the performance benefits and weighing the options against their respective risks.
Planning and careful implementation will minimize disruptions while making the business case to fund capital improvements and maintenance strategies. When the business case for additional redundancies, consultants, and ongoing training reaches the boardroom, the entire organization can be galvanized to prevent catastrophic data losses, damage to capital equipment, and even danger to life safety.
Figure 3.1 “Seven steps” is a continuous cycle of evaluation, implementation, preparation, and maintenance
(Source: Courtesy of PMC Group One, LLC)
Table 3.1 Law of Nines
% Uptime/Reliability Level | Downtime Per Year |
---|---|
99% | 87.6 hours |
99.9% | 8.76 hours |
99.99% | 52 minutes |
99.999% | 5.25 minutes |
99.9999% | 32 seconds |