Читать книгу Maintaining Mission Critical Systems in a 24/7 Environment - Peter M. Curtis - Страница 15

1.2 Risk Assessment

Оглавление

Critical industries require an extraordinary degree of planning and assessing. It is important to identify the best strategies to reach the targeted level of reliability. In order to design a critical building with the appropriate level of reliability, the cost of downtime and the associated risks need to be assessed. It is important to understand that downtime occurs due to more than one type of failure: design failure, catastrophic failures, equipment failures or failures due to human error. Each type of failure will require a different approach on prevention. A solid and realistic approach to business resiliency must be a priority, especially because the present critical infrastructure is inevitably designed with all the eggs located in one basket.

Within the banking and financial services, planning the critical area places considerable pressure on designing an infrastructure that evolves in an effort to support continuous business growth. Routine maintenance and upgrading equipment alone do not ensure continuous availability. The 24/7 operation of such service means an absence of scheduled interruptions for any reason, including routine maintenance, modifications, and upgrades. The main question is how and why infrastructure failures occur. Employing new methods of distributing critical power, understanding capital constraints, and developing processes that minimize human error are some key factors in improving recovery time in the event critical systems are impacted by base‐building failures.

The infrastructure reliability can be enhanced by conducting a formal Risk Management Assessment (RMA), gap analysis, and by following the guidelines of the Critical Area Program (CAP). The RMA and the CAP are used in other industries and customized specifically for the needs of Data Center environments. The RMA is an exercise that produces a system of detailed, documented processes, procedures, checks, and balances designed to minimize operator and service provider errors. The practice CAP ensures that only trained and qualified people are associated and authorized to have access to critical sites. These programs, coupled with Probability Risk Assessment (PRA), address the hazards of data center uptime. The PRA looks at the probability of failure of each type of electrical power equipment. Performing a PRA can be used to predict availability, number of failures per year, and annual downtime. The PRA, RMA, and CAP are facilitating agents when assessing each step listed below.

 Engineering and design

 Project management

 Testing and commissioning

 Documentation

 Education and training

 Operation and maintenance

 Employee certification

 Risk indicators related to ignoring facility process management

 Standard and benchmarking

Industry regulations & policies continue to be more stringent than ever. They are heavily influenced by Basel II, Sarbanes‐Oxley Act (SOX), NFPA 1600, and U.S. Securities and Exchange Commission (SEC). Basel II recommends “three pillars” ‐ risk appraisal and control, supervision of the assets, and monitoring of the financial market ‐ to bring stability to the financial system and other critical industries. Basel II implementation involves identifying operational risk then allocating adequate capital to cover potential loss. As a response to corporate scandals in the close to decades ago, SOX came into force in 2002 and passed the following act: The financial statement published by issuers is required to be accurate (Sec 401); issuers are required to publish information in their annual reports (Sec 404); issuers are required to disclose to the public, on an urgent basis, information on material changes in their financial condition or operations (Sec 409); and impose penalties of fines and /or imprisonment for not complying (Sec 802). The purpose of the NFPA 1600 Standard is to help the disaster management, emergency management, and business continuity communities to cope with critical events. Keeping up with the rapid changes in technology has been a longstanding priority. The constant dilemma of meeting the required changes within an already constrained budget can become a limiting factor in achieving optimum reliability.

Maintaining Mission Critical Systems in a 24/7 Environment

Подняться наверх