Читать книгу Maintaining Mission Critical Systems in a 24/7 Environment - Peter M. Curtis - Страница 37

2.4 How Risks Are Addressed

Оглавление

The need to provide continuous operation under all foreseeable risks of failure, such as power outages, equipment breakdown, natural phenomena, and terrorist attacks, requires the use of many techniques to enhance reliability and resiliency. These techniques include redundant systems and components such as standby power generation, UPS systems, automatic transfer switches, static transfer switches, and the use of probability risk analysis modeling software. This software identifies potential weaknesses of critical infrastructure, develops maintenance programs, and upgrades action plans for all major systems.

Electric utility transmission and distribution system planners attempt to predict future load growth, design capital projects to construct the necessary additional capacity, and attempt to design adequate redundancy if the main supply line ever fails. This design concept identifies “preferred” and “alternate” or “contingency” supplies. It is used by most utilities and is commonly referred to as an “n‐1” or “n‐2” design. However, the alternate supplies are limited in number.

Electric system supply redundancy can be constructed in a number of ways. One method is to construct a power plant in an area (a “load pocket”) which needs the electricity. Power plants are huge capital investments, however, and must be off‐line periodically for extended maintenance and upgrades. Another way of bringing power, as well as redundancy, is extending transmission facilities from an adjacent area. This, too can be a costly and time‐consuming process, especially taking into account the permitting processes in many states. The upshot is that ultimately, either strategy will likely be used under any given circumstances. However, state governmental utility regulators, permit grantors, and even the Federal Energy Regulatory Commission (FERC) all have a voice in how generation and transmission systems are constructed and reinforced in the future.

Overall electric reliability is also dependent upon how the utility transmission and distribution facilities are constructed. While lightning strikes affect overhead and underground facilities alike, storms affect overhead constructed facilities to a much greater extent than they affect underground facilities. In spite of this added disadvantage, the vast majority of utility facilities are overhead, constructed on poles. Underground construction, while more insulated from mechanical storm damage, is more expensive to construct, needs more redundancy built into the system, and is more labor‐intensive to troubleshoot and repair. Damage to overhead poles and wires is immediately visible and repaired, and end‐of‐life replacement is also much less costly. Pure economics clearly favors such overhead construction. As a result, owners of critical facilities will be predominantly supplied by overhead utilities, which are subject to more, shorter interruptions, but are easier to repair. Even with the implementation of increased utility transmission, distribution system monitoring, and data communications effectively building toward the “Smart Grid,” outages may become less frequent and corrected faster, but they will still occur especially when telecommunications facilities are mounted on the same poles as electric utilities.

Factors such as overhead vs. underground construction, alternate feeder supplies, susceptibility to weather, and other mechanical damages all play into a utility’s overall reliability record. Electric utility reliability metrics have several measurements, but the most commonly used are:

 The Sustained Average Interruption Frequency Index (SAIFI): This is a measurement of the months between interruptions for the utility’s electric customers. For example, if a utility has a SAIFI of 0.9 indicates that the utility’s average customer experiences a sustained electric interruption every 0.9 × 12 months = 10.8 months.

 The Customer Average Interruption Duration Index (CAIDI): This is an average of outage minutes experienced by each customer who experiences a sustained interruption. For example, if a utility has a CAIDI of 120 minutes, this means that on average a power outage will be restored within 120 minutes. However, since utilities generally follow a prioritized restoration practice where outages affecting many customers are addressed prior to smaller and single customer outages, the larger outages may be restored in 10 minutes (via automated switching), and smaller outages may take as long as 300 minutes to be restored. These long and short outages result in the overall CAIDI average of 120 minutes.

 The Momentary Average Interruption Frequency Index (MAIFI): This measures the average number of momentary interruptions experienced by utility customers. Depending upon state regulations, momentary interruptions are defined as any interruption lasting less than 2 to 5 minutes. If the criteria are less than 5 minutes, an interruption of 4 minutes and 59 seconds will not count towards the utility SAIFI metric but is considered momentary. It should be noted that in the mission critical industry, an outage of eight milliseconds can be catastrophic if the facility is not properly protected by a UPS or the critical systems do not operate according to their design specifications.

Each metric measures a different statistic, and any organization can consult their local utility for information about the actual historical reliability metrics as they pertain to their facility’s specific feeder supply. For an organization that is forward‐thinking, these utility reliability indices will drive the level of redundancy and business resiliency as it pertains to the critical infrastructure. The organization should, in conjunction with the local utility’s input, assess the utility’s SAIFI, CAIDI, and MAIFI for both the utility’s service territory as well as for the local distribution circuit supplying power to that business. Once these historical reliability metrics are known, the organization can plan for the likeliest and most feasible outage scenarios (many sustained interruptions but few momentary outages, long utility repair times, etc.) As mentioned previously, to address human risk factors, SOP’s, EAP’s, and ARP’s need to be available at a moment’s notice so trained personnel can respond with situational awareness and confidence.

Many companies use web‐based information management systems to address human risk factors. A living web‐based document system can produce a “database” of perpetually refreshed knowledge, providing the level of granularity necessary to operate, maintain, and repair mission critical infrastructure. Keeping the ever‐changing documents current and secure can then be easily addressed each time a capital project is completed, or an infrastructure change is made. One such program is SmartWALK®– a web‐based document portal shown in Figure 2.10. It is important to secure this critical infrastructure knowledge and also leverage this asset for employee training and succession planning.


Figure 2.6 Solar Flare.

Source: ESA/NASA/SOHO.


Figure 2.7 EMP Waveform – MIL‐STD‐461G Test Method RS105

(Source: Courtesy of Retlif Testing Laboratories).


Figure 2.8 RS105 Transient Generator and Transmission Line

(Courtesy of Retlif Testing Laboratories).


Figure 2.9 Damped Sinusoidal Transient – MIL‐STD‐461G Test Method CS1116

(Source: Courtesy of Retlif Testing Laboratories).


Figure 2.10 SmartWALK™ mobile device

(Courtesy of PMC Group One, LLC)


Figure 2.11 The Smart Grid Network and its features.

Events such as the terrorist attacks of September 11th, the Northeast Blackout of 2003, the 2006 Hurricane season, and the outages in Italy and Greece in 2003 and 2004, respectively, which left many millions without power, have emphasized our interdependencies with other critical infrastructures—most notably telecommunications. There are numerous strategies and sector‐specific plans such as Basel II, US Patriot Act, SOX and, NFPA 1600, all of which highlight the responsibility of the private sector for increasing resiliency and redundancy in business processes and systems. These events have also prompted the revision of laws, regulations, and policies governing the reliability and resiliency of the power industry. Some of these measures also delineate controls required of some critical infrastructure sectors to maintain business‐critical operations during a critical event (please see Appendix A for further information).

The unintended consequence of identifying vulnerabilities is the fact that such diligence can actually invite attacks tailored to take advantage of them. In order to avoid this, one must anticipate the vulnerabilities created by responses to the existing ones. New and better technologies for energy supply and efficient end‐use will clearly be required if the daunting challenges of the decades ahead are to be adequately addressed.

In 2000, the Electric Power Research Institute (EPRI) launched a consortium dedicated to improving electric power reliability for the new digital economy. Participants in this endeavor, known as the Consortium for Electric Infrastructure to Support a Digital Society or CEIDS, include power providers and a broad spectrum of electric reliability stakeholders. Participation in CEIDS is also open to digital equipment manufacturers, companies whose productivity depends on a highly reliable electricity supply, and industry trade associations.

According to EPRI, CEIDS (now known as IntelliGrid) represents the second phase of a bold, two‐phase national effort to improve overall power system reliability. The first phase of the plan, called the Power Delivery Reliability Initiative, launched in early 2000, brought together more than twenty North American electric utilities as well as several trade associations to make immediate and clearly necessary improvements to utility transmission and distribution systems. In the second phase, CEIDS addresses, more specifically, the growing demand for “digital quality” electricity.

“Unless the needs of diverse market segments are met through a combination of power delivery and end‐use technologies, U.S. productivity growth and prosperity will increasingly be constrained,” explains Karl Stahlkopf, a former Vice President of Power Delivery at EPRI. “It’s important that CEIDS study the impact of reliability on a wide spectrum of industries and determine the level of reliability each requires.”

Specifically, CEIDS focuses on three reliability goals:

1 Preparing high‐voltage transmission networks for the increased capacity and enhanced reliability needed to support a stable wholesale power market.

2 Determining how distribution systems can best integrate low‐cost power from the transmission system with an increasing number of distributed generation and storage options.

3 Analyzing ways to provide digital equipment, such as computers and network interfaces, with an appropriate level of built‐in protection.

It is only through these wide‐reaching efforts to involve all industry constituencies that the industry can raise the bar with respect to protective measures and knowledge sharing.

Maintaining Mission Critical Systems in a 24/7 Environment

Подняться наверх