Читать книгу The RCM Solution - Nancy Regan - Страница 9
ОглавлениеIntroduction to Reliability Centered Maintenance
I am always excited to discuss Reliability Centered Maintenance (RCM) because I have seen first hand the overwhelming positive results that can be reaped when the process is applied correctly with the right people. RCM isn’t a new process. The application of its principles spans four decades; it has been (and is being) applied in nearly every industry throughout the world.
Contrary to criticism about the process, RCM can be carried out swiftly and efficiently when executed properly.
RCM principles can be widely applied to an entire asset or more narrowly applied to select pieces of equipment.
RCM is one of the most powerful asset management processes that can be employed. Contrary to criticism about the process, RCM can be carried out swiftly and efficiently when executed properly. Additionally, RCM’s principles are so diverse that they can be applied to any asset—an airplane, nuclear power plant, truck, tank, ship, manufacturing plant, offshore oil platform, mobile air conditioning unit, tow tractor, jet engine, a single pump, or an engine control unit. RCM principles can be widely applied to an entire asset or more narrowly applied to select pieces of equipment.
RCM can also be used to formulate scores of solutions that reach far beyond maintenance.
The name Reliability Centered Maintenance lends itself to a process that is used to develop proactive maintenance for an asset, but RCM can also be used to formulate scores of solutions that reach far beyond maintenance. These solutions can offer tremendous benefit to an organization. Nevertheless, when applying RCM, many organizations focus only on the development of a proactive maintenance program, which doesn’t take full advantage of RCM’s powerful principles. This book sets forth the principles of RCM in a straightforward manner so that those interested in applying RCM can be aware of not only how uncomplicated the application of RCM can be, but also how powerful it is.
1.2 Elements that Influence a System
It is especially important to look beyond proactive maintenance because there are so many elements that influence a system, as depicted in Figure 1.1.
Figure 1.1 Examples of elements that influence a system
It doesn’t matter what the equipment is. Many factors have a direct effect on equipment performance: the scheduled maintenance that is applied, the operating procedures that are performed, the technical publications that are referenced, the training programs that are attended, the design features that are in service, the spare parts (or lack thereof) that are relied upon, how often an asset is operated, where equipment is required to function, and the emergency procedures that are in place. If these strategies are well developed, the equipment (and thus the organization) benefit. If any of these strategies are ill-conceived or inappropriate, the process by which the equipment plays a part suffers.
1.3 The Essence of RCM: Managing the Consequences of Failure
It is often wrongly believed that equipment custodians are in the business of preventing failure. Although it is possible to develop strategies that do prevent some failures (see Chapter 9), it is nearly impossible to prevent all failures. For example, is it possible to prevent all failures associated with an electric motor? How about an automobile starter, avionics equipment, or a turbine engine? Certainly not. Thus, other strategies are often put in place in order to manage otherwise unpreventable failures when they occur.
Responsible custodians are in the business of managing the consequences of failure—not necessarily preventing them.
For example, organizations rely heavily on operating procedures, emergency procedures, training programs, and redundancy in the design of equipment, as depicted in Figure 1.1. There are three fully redundant hydraulic systems on most commercial aircraft because it is understood that all causes of failure for a hydraulic system cannot be prevented. If one of the three systems fails, two fully redundant systems are available to provide the required hydraulic power for safe flight. Because all failures cannot be prevented, responsible custodians must put other solutions in place to properly deal with failure when it occurs. In other words, responsible custodians are in the business of managing the consequences of failure—not necessarily preventing them.
Myriad issues, such as incomplete operating procedures or poor equipment design, can negatively affect equipment performance. For that reason, it is incredibly important that these issues are identified and included in an RCM analysis. Including them allows the matter to be analyzed using RCM principles so that a technically appropriate and effective solution can be formulated.
One of the major products of an RCM analysis is the development of a scheduled maintenance program. However, as depicted in Figure 1.2, RCM can help formulate other solutions such as the development of a proactive maintenance plan, new operating procedures, updates to technical publications, modifications to training programs, equipment redesigns, supply changes, enhanced troubleshooting procedures, and revised emergency procedures.
In the context of RCM, these other solutions are referred to as default strategies, as depicted in Figure 1.3.
Figure 1.2 Examples of solutions that RCM can yield
Figure 1.3 RCM can yield a scheduled maintenance program and default strategies
In the context of RCM, together, scheduled maintenance tasks and default strategies are referred to as failure management strategies, as depicted in Figure 1.4. These solutions are designed to manage failure.
Figure 1.4 Failure management strategies
1.5 The Evolution of RCM Principles
It is important to understand the evolution of RCM in order to appreciate the majesty of its principles. RCM’s evolution is best told as a story, as it was told to me.
The story starts in the mid 1950s in the commercial airline industry where, at the time, it was believed that nearly all failures were directly related to operating age. In other words, failure was more likely to occur as operating age increased. Figure 1.5 illustrates this point.
The x-axis represents age, which can be measured in any units such as calendar time, operating hours, miles, and cycles. The y-axis represents the conditional probability of failure. The philosophy associated with the failure pattern is that, assuming an item stays in service and reaches the end of the useful life, the probability of failure greatly increases if it remains in service. In other words, as stated by United Airlines’ Stanley Nowlan and Howard Heap, it was believed that “every item on a complex piece of equipment has a ‘right age’ at which complete overhaul is necessary to ensure safety and operating reliability.” Therefore, it was believed that the sensible thing to do was to overhaul or replace components before reaching the end of the useful life with the belief that this would prevent failure.
The mindset that failure was more likely to occur as operating time increased was deeply embedded in the maintenance programs. At the time, approximately 85% of aircraft components were subject to fixed interval overhaul or replacement. The maintenance programs were very high in scheduled overhauls and scheduled replacements.
Figure 1.5 Traditional view of failure
Time marched on. By the late 1950s, new aircraft emerged that included brand new and more technologically advanced equipment such as electronics, hydraulics, pneumatics, pressurized cabins, and turboprop engines. Because the equipment was new, there was no operational experience or any historical failure data available. Therefore, the useful life of the new equipment components was unknown. However, a maintenance plan still had to be developed. As a result, the new plans were mirrored from the old plans. For the new equipment where there was no current maintenance to mirror, they took their best educated guess. The aircraft were sent into service and maintained using maintenance plans formulated in this manner.
By the early 1960s, failure data had been accumulated. Worldwide, the crash-rate was greater than 60 crashes per million takeoffs, and two-thirds of these crashes were due to equipment failure. To put this crash rate into perspective, that same crash rate in 1985 would be the equivalent of two Boeing 737s crashing somewhere in the world every day.
The increased crash rate became an issue for operations, management, government, and regulators, so action was taken in an attempt to increase equipment reliability. Consistent with the philosophy at the time—that failure was directly related to operating age (as depicted in Figure 1.6)—the overhaul and replacement intervals were shortened, thereby increasing the amount of maintenance that was performed and increasing maintenance downtime. An example of a shortened overhaul interval is depicted in Figure 1.6.
Figure 1.6 Example of a shortened overhaul interval
The new maintenance plans were put into service. After a period of time, they noticed that three things happened.
1.In very few cases things got better.
2.In very few cases things stayed the same.
3.But, for the most part things got worse.
The Federal Aviation Administration (FAA) and industry were frustrated by their inability to control the failure rate by changing the scheduled overhaul and replacement intervals. As a result, a task force was formed in the early 1960s. This team of pioneers was charged with the responsibility of obtaining a better understanding of the relationship between operating reliability and policy for overhaul and replacement.
They identified that two assumptions were embedded in the current maintenance philosophy.
Assumption 1: The likelihood of failure increases as operating age increases.
Assumption 2: It is assumed we know when those failures will occur.
The team identified that the second assumption had already been challenged. In an attempt to decrease the failure rate, the overhaul and replacement intervals were shortened, as depicted in Figure 1.6. But when the intervals were shortened, the failure rate increased. It was then identified that the first assumption—the likelihood of failure increases as operating age increases—needed to be challenged.
As a result, an enormous amount of research was performed. Electronics, hydraulics, pneumatics, engines, and structures were analyzed. What was discovered rocked the world of maintenance at the time. The research showed that there wasn’t one failure pattern that described how Failure Modes behave. In fact there are six failure patterns, as seen in Figure 1.7.
Failure patterns A, B, and C all have something in common. They exhibit an age-related failure phenomenon. Likewise, failure patterns D, E, and F have something in common. They exhibit randomness.
Figure 1.7 Six patterns of failure
What was especially shocking was the percentage of Failure Modes that conformed to each failure pattern. Figure 1.8 summarizes the percentage of Failure Modes conforming to each failure pattern.
Figure 1.8 Percentages of Failure Modes that conformed to each failure pattern
Collectively, only 11 percent of aircraft system Failure Modes behaved according to failure patterns A, B, and C, where the likelihood of failure rises with increased operating age. Failure patterns A and B have a well-defined wearout zone; it makes sense that Failure Modes conforming to these failure patterns could effectively be managed with a fixed interval overhaul or replacement. Failure patterns A, B, and C are typically associated with simple items that are subject to, for example, fatigue or wear such as tires, brake pads, and aircraft structure.
However, the remaining 89 percent of aircraft system Failure Modes occur randomly. They correspond to failure patterns D, E, and F. After the short increase in the conditional probability of failure in pattern D, as well as the infant mortality period present in failure pattern F, the Failure Mode has the same likelihood of occurring at any interval in the equipment’s expected service life. Therefore, for 89% of Failure Modes, it makes no sense to perform a fixed interval overhaul or replacement because the probability of failure is constant. These failure patterns are typically associated with complex equipment such as electronics, hydraulics, and pneumatics.
Two most notable issues
1.Only two percent of the Failure Modes conformed to failure pattern B as shown in Figure 1.9, yet this was the failure pattern that defined the way they believed equipment failure behaved!
Figure 1.9 Percentage of Failure Modes that conformed to Failure Pattern B
2.After the short increase in the conditional probability of failure in pattern D, as well as the infant mortality period present in failure pattern F, 89 percent of Failure Modes occur randomly, as depicted in Figure 1.10.
What was astonishing was that the maintenance plans in use were derived assuming nearly all Failure Modes behaved according to failure pattern B. Yet only two percent of the Failure Modes actually behaved that way. Furthermore, it was shown that most Failure Modes occur randomly. Therefore, fixed interval overhaul or replacement technically made no sense. That is, if an item is replaced today, it has the same chance of failing tomorrow as it does one year later.
Figure 1.10 Percentage of Failure Modes that conformed to Failure Patterns D, E, and F
Figure 1.11 Percentage of Failure Modes that conformed to Failure Pattern F
Figure 1.12 Reintroducing infant mortality
More important, not only were the vast majority of scheduled overhauls and replacements senseless, their efforts to control the failure rate with fixed interval overhaul and replacement were counterproductive. Their study showed that 68 percent of Failure Modes behaved according to failure pattern F, as depicted in Figure 1.11.
Infant mortality (e.g., component installed backwards, tool left behind, poor operating procedures) played a significant role in the high unreliability rates. Therefore, these weaknesses were making things worse with scheduled overhauls and replacements. As depicted in Figure 1.12, each time a scheduled overhaul or replacement was performed, infant mortality was reintroduced into an otherwise stable system.
Because most Failure Modes occur randomly, the failure rate could not be controlled by performing more scheduled overhauls and replacements.
This research conclusively proved that fixed interval overhaul or replacement is technically not the right action to take when failure is not a function of operating age. In fact, in most cases, scheduled over-haul and replacement hurt reliability. Because most Failure Modes occur randomly, the failure rate could not be controlled by performing more scheduled overhauls and replacements. Armed with these facts, a new way of deriving scheduled maintenance tasks needed to be developed, setting the stage for the birth of RCM principles.
1.6 The Development of RCM Principles
From this research, RCM principles were first conceived within the commercial airline industry. MSG-1, Handbook: Maintenance Evaluation and Program Development was prepared by the 747 Maintenance Steering Group and published in 1968. This document contained the first use of decision diagram techniques to develop a prior-to-service scheduled maintenance program.
Improvements to MSG-1 led to the development of MSG-2: Airline/Manufacturer Maintenance Program Planning Document, which was published in 1970. MSG-2 was used to develop the scheduled maintenance programs for the Lockheed 1011 and the Douglas DC-10. It was also used on tactical military aircraft McDonnell F4J and the Lockheed P-3.
In the mid-1970s, the Department of Defense was interested in learning more about how maintenance plans were developed within the commercial airline industry. In 1976 the Department of Defense commissioned United Airlines to write a report that detailed their process. Stanley Nowlan and Howard Heap, engineers at United Airlines, wrote a book on the process and called it Reliability-Centered Maintenance. Their book was published in 1978. To many, Stanley Nowlan and Howard Heap are considered two of the most significant pioneers of the RCM process. Their book remains one of the most important documents ever written on equipment maintenance.
Using Nowlan and Heap’s book as a basis for update, MSG-3, Operator / Manufacturer Scheduled Maintenance Development was published in 1980. Since then, MSG-3 has gone through many updates. MSG-3 continues to be used within the commercial airline industry today, but is still intended to develop a scheduled maintenance program for prior to service aircraft.
Since Nowlan and Heap’s book was published, there have been various updates to the RCM process, namely the identification of environmental issues. The late John Moubray was another great pioneer of the RCM process; he did a great deal to advance RCM throughout commercial industry. His book RCM II was first published in the United Kingdom in 1991 and in the United States in 1992.
Streamlined RCM and SAE JA1011
Although RCM is a resource intensive process, analyses can be completed efficiently if the process is used correctly with the right people. However, in the mid 1990s, streamlined versions of RCM started to appear. These versions often omit key steps in the process and differ significantly from what Nowlan and Heap originally intended. As a result, the Society of Automotive Engineers (SAE) published SAE JA1011, Evaluation Criteria for Reliability-Centered Maintenance (RCM) Processes in 1999. This internationally-recognized standard outlines the criteria that any RCM process must embody in order to be called RCM. SAE JA1011 was updated in 2009.
The RCM process defined in this book complies with SAE JA1011. More important, it remains true to what the original pioneers of the process, Stanley Nowlan and Howard Heap, originally intended. Therefore, this books details True RCM.
RCM is a remarkable process and can be defined as follows. The terms zero based, failure management strategies, and operational environment bear further explanation.
Reliability Centered Maintenance is a zero-based, structured process used to identify the failure management strategies required to ensure an asset meets its mission requirements in its operational environment in the most safe and cost effective manner. |
Zero-based
Each RCM analysis is carried out assuming that no proactive maintenance is being performed. In other words, Failure Modes and Failure Effects are written assuming that nothing is being done to predict or prevent the Failure Mode. In this way, consequences of each Failure Mode can be assessed and solutions can be formulated with no bias towards what is currently being done.
Failure Management Strategies
Notice that the definition states that RCM is used to identify failure management strategies, not maintenance tasks. As explained earlier, managing assets requires more than just scheduled maintenance. Therefore, RCM provides powerful tools for developing other solutions, as detailed in Figure 1.2.
Operational Environment
How an asset is maintained depends on far more that just what an asset is. When solutions for assets are formulated, the following issues regarding the operational environment must be considered.
•Physical environment in which the asset will be used (e.g., cold weather, desert climate, controlled environment)
•Operational tempo (e.g., 24 hour operation, system runs 6 hours each day)
•Circumstances under which the system will be operated (e.g., stand-alone, one of four systems runs at one time but is rotated every month)
•Redundancy (e.g., the system or any of its components operate in the presence of a backup)
These issues can greatly influence not only what maintenance tasks are identified and how often they are performed, but also other solutions such as equipment design and training programs. Therefore, the operational environment must be clearly defined.
1.8 Defining Performance in the Context of RCM
In the context of RCM, there are two features regarding equipment performance that responsible custodians must carefully examine: design capability and required performance.
When it comes to defining performance, equipment custodians must be specific about what their assets can do (design capability) and what they need them to do (required performance).
Asset owners perform RCM to determine what actions must be taken to ensure that equipment meets mission requirements. A mission could be towing a piece of equipment to the construction site, launching an aircraft from an aircraft carrier, or ensuring that there is adequate plant air for the downstream manufacturing process. But when it comes to defining performance, equipment custodians must be specific about what their assets can do (design capability) and what they need them to do (required performance). The following discussion illustrates this point.
Take, for example, a water tube steam boiler. As illustrated in Figure 1.13, the design capability is a Maximum Allowable Working Pressure (MAWP) of 500 psi. However, the required performance is 650 psi. Is this scenario acceptable? Absolutely not, because what the organization requires (650 psi) exceeds the design capability of the boiler (500 psi).
Figure 1.14 illustrates another example. Here, the design capability is an MAWP of 650 psi and the required performance is 500 psi. Is this scenario acceptable? Yes, because what the organization requires (500 psi) fits within the design capability of the asset.
Figure 1.13 Organizational requirements exceed design capability
Figure 1.14 Organizational requirements fit within the design capability of the asset
This may seem to be an incredibly simple concept—so basic and fundamental that it doesn’t even warrant being mentioned. It appears that way. However, this concept is a very serious issue. If an organization gets it wrong, it can turn deadly. In fact, it has turned deadly.
Three Air Tanker Crashes
The National Transportation Safety Board (NTSB) investigated three air tanker crashes. The following information was reported in the NTSB Safety Recommendation dated April 23, 2004.
On August 13, 1994, a Lockheed C-130A Hercules experienced an in-flight separation of the right wing near Pearblossom, California, while responding to a forest fire near the Tahachapi Mountains. All three crewmembers were killed and the airplane was completely destroyed. (An aircraft similar to the C-130A can be seen in Figure 1.15.)
Figure 1.15 C-130 Aircraft, similar to the C-130A Tanker that crashed on August 13, 1994 and June 17, 2002 (Photo from Photo NSA online; http://www.nsa.gov/about/photo_gallery/index.shtml.)
Figure 1.16 C130A June 17, 2002, crash site from the NTSB report (Photo from NTSB, September 24, 2002, NTSB Advisory, Update on Investigations of Firefighting Airplane Crashes in Walker, California and Estes Park, Colorado; http://www.ntsb.gov/pressrel/2002/020924.htm )
On June 17, 2002, another Lockheed C-130A Hercules experienced an in-flight breakup that was initiated by separation of the right wing, followed by separation of the left wing, while executing a fire retardant drop over a forest fire near Walker, California. Both wings detached from the fuselage at their respective center wing box-to-fuselage attachment locations. All three flight crewmembers were killed, and the airplane was completely destroyed. Figure 1.16 depicts the June 2002 crash site.
On July 18, 2002, a Consolidated Vultee P4Y Privateer experienced an in-flight separation of the left wing while maneuvering to deliver fire retardant over a forest fire near Estes Park, Colorado. Both crewmembers were killed and the airplane was destroyed. (A similar aircraft is shown in Figure 1.17.)
Figure 1.17 P4Y Privateer similar to the one that crashed on July 18, 2002 (Library of Congress, Prints & Photographs Division, FSA/OWI Collection, [LC-USE6- D-009930])
All three aircraft were leased by the U.S. Department of Agriculture’s Forest Service for public firefighting flights. However, the aircraft detailed above were originally designed to transport cargo for the U.S. military—not to fight forest fires.
Air Tanker Crashes: Design Capability versus Required Performance
The operational environment and the loads experienced by an aircraft transporting cargo are vastly different from those experienced by an aircraft fighting forest fires. The NTSB report explains that during a fire-fighting mission, an aircraft experiences “frequent and aggressive low-level maneuvers with high acceleration loads and high levels of atmospheric turbulence.” The NTSB report further details that the maintenance programs used for the aircraft were the same that were derived for the aircraft when their mission was transporting cargo for the military. The report states that the aircraft were likely “operating outside the manufacturers’ original design intent.”
In the context of RCM, the required performance of the organization using the air tankers far exceeded the design capability of the aircraft. The structural lives of the aircraft were shortened because of the harsh operating environment and the far more aggressive loads applied to the aircraft during fire-fighting versus transporting cargo. The increased loading accelerated fatigue crack initiation and sped up the crack propagation time. Therefore, the structural inspections that were in place were not accomplished often enough to identify the crack before it caused catastrophic failure. The simple concept of ensuring that an asset is capable of doing what the organization requires was completely overlooked.
Aloha Airlines, Flight 243
On April 28, 1988, Aloha Airlines, Flight 243 took off from Hilo, Hawaii, at 1:25 p.m. Shortly after the aircraft leveled off at 24,000 feet, the aircraft experienced explosive decompression and structural failure that ripped away a large section of the fuselage, as shown in Figure 1.18. One of the flight attendants, Clarabelle Lansing was immediately wrenched from the airplane. The aircraft made an emergency landing at Kahului Airport. The 89 passengers onboard and the remaining 4 crewmembers survived.
This tragedy is another example of required performance being allowed to exceed design capability.
Aloha Airlines: Design Capability versus Required Performance
Aloha Airlines was using its 737s for inter-island Hawaiian flights. According to the NTSB Aircraft Accident Report, those aircraft were accumulating three flight cycles (take-off and landing) for every hour in service. However, Boeing designed the structural inspections for the 737 assuming that the aircraft would accumulate about one and a half cycles per flight hour. Therefore, the aircraft were accumulating flight cycles at twice the rate for which the Boeing Maintenance Planning Data (MPD) was designed. Similar to the air tanker crashes described previously, this use accelerated fatigue crack initiation and increased the crack propagation time. The structural inspections and associated intervals in place were inadequate; they were not accomplished frequently enough to detect the crack before catastrophic failure occurred.
Figure 1.18 Aloha Airlines, Flight 243, April 28, 1988 after landing (Associated Press /Robert Nichols)
The air tanker fatal crashes and the Aloha Airlines’ accident are only two examples that underscore the critical importance of ensuring that an asset’s design capability is capable of meeting organizational requirements. It is a simple concept that is too often overlooked. During an RCM analysis, asset design capability and required performance are carefully analyzed.
The application of True RCM consists of preparing an Operating Context and carrying out the 7 steps of RCM.
1.9 Introduction to the RCM Process
The application of True RCM consists of preparing an Operating Context and carrying out the 7 steps of RCM.
The application of True RCM consists of preparing an Operating Context and carrying out the 7 steps of RCM. Figure 1.19 outlines the RCM process.
Chapters 2 through 8 detail the Operating Context and the seven steps of the RCM process. The following discussion briefly introduces each concept.
Figure 1.19 The RCM Process
Operating Context
An Operating Context is a document that includes relevant technical information such as the scope of analysis, theory of operation, equipment description, and RCM analysis notes. In essence, it is a storybook identification of the system to be analyzed. The Operating Context also documents notes and assumptions regarding analysis decisions. It is an important source of reference for working group and validation team members.
In the interest of time, the Operating Context is typically drafted by the facilitator before the analysis begins and is then reviewed with the working group before the first step in the RCM process (identifying Functions) is accomplished. During this time, the working group reviews and revises the Operating Context, as required. The Operating Context is considered a living document; it is edited as more is learned about the equipment and additional issues come to light during the analysis.
Step 1: Functions
The intention of RCM is to determine what solutions must be put in place to ensure an asset meets the requirements of the organization. The air tanker crashes and the Aloha Airlines disaster detailed previously illustrate how critical it is to understand what is required of an asset so that it can be determined if the asset is capable of fulfilling those requirements. For this reason, the first step in the RCM process is to identify Functions.
Functions and associated performance standards are always written to reflect what the organization requires from the asset rather than what the system is designed to provide. During Function development, it is often noted that the organization’s expectations of the equipment exceed the actual capabilities of the asset. As depicted in Figure 1.20, the Primary Function (the main purpose the system exists) and Secondary Functions (other Functions of the asset) are recorded.
Step 2: Functional Failures
Step 2 in the RCM process is to identify Functional Failures for each Function. Nowlan and Heap define Functional Failure as an unsatisfactory condition. As depicted in Figure 1.21, both Total and Partial Functional Failures are recorded for each Function. A Total Failure means no part of that Function can be performed. Partial Failure describes how the Function is still possible but is performed at an unsatisfactory level.
Figure 1.20 Primary and Secondary Functions
Figure 1.21 Total and Partial Failures
Step 3: Failure Modes
A Failure Mode is what causes a Functional Failure. During Step 3 of the RCM process, Failure Modes that cause each Functional Failure are identified. It is often wrongly believed that all Failure Modes associated with the system being analyzed must be recorded. On the contrary, RCM provides specific guidelines for determining what Failure Modes to include in an analysis. Only Failure Modes that are reasonably likely to occur in the operating context should be included. If the answer to one or more of the following questions is “yes,” the Failure Mode should be included in the analysis:
•Has the Failure Mode happened before?
•If the Failure Mode has not happened, is it a real possibility?
•Is the Failure Mode unlikely to occur but the consequences are severe?
•Is the Failure Mode currently managed via proactive maintenance?
Failure Modes included in most analyses consist of typical causes such as those due to wear, erosion, corrosion, etc. However, it is very important to include Failure Modes that cover issues such as human error, incorrect technical manuals, inadequate equipment design, and lack of emergency procedures. Such Failure Modes allow issues to be analyzed as part of the RCM process so that solutions in addition to proactive maintenance can be developed.
Step 4: Failure Effects
During Step 4, a Failure Effect is written for each Failure Mode. A Failure Effect is a brief description of what would happen if nothing were done to predict or prevent the Failure Mode. Failure Effects should be written in enough detail so that the next step in the RCM process, Failure Consequences, can be identified. Failure Effects should include:
•Description of the failure process from the occurrence of the Failure Mode to the Functional Failure
•Physical evidence that the failure has occurred
•How the occurrence of the Failure Mode adversely affects safety and/or the environment
•How the occurrence of the Failure Mode affects operational capability/mission
•Specific operating restrictions as a result of the Failure Mode
•Secondary damage
•What repair is required and how long it is expected to take
Information Worksheet
Steps 1 through 4 of the RCM process are recorded in the Information Worksheet, as depicted in Figure 1.22. The Information Worksheet includes Functions, Functional Failures, Failure Modes, and Failure Effects.
Figure 1.22 The Information Worksheet
Step 5: Failure Consequences
A properly written Failure Effect allows the Failure Consequence to be assessed. A Failure Consequence describes how the loss of function caused by the Failure Mode matters. There are four categories of Failure Consequences:
•Safety
•Environmental
•Operational
•Non-Operational
Step 6: Proactive Maintenance and Associated Intervals
After consequences are assessed, the next step in the RCM process is to consider proactive maintenance as a failure management strategy. In the context of RCM, the proactive maintenance tasks that may be identified include:
Scheduled Restoration A scheduled restoration task is performed at a specified interval to restore an item’s failure resistance to an acceptable level—without considering the item’s condition at the time of the task. An example of a scheduled restoration task is retreading a tire at 60,000 miles.
Scheduled Replacement A scheduled replacement task is performed at a specified interval to replace an item without considering the item’s condition at the time of the task. An example is a scheduled replacement of a turbine engine compressor disk at 10,000 hours.
Scheduled restorations and scheduled replacement tasks are performed at specified intervals regardless of the item’s condition.
On-Condition Task An On-Condition task is performed to detect evidence that a failure is impending. In the context of RCM, the evidence is called a potential failure condition and can include increased vibration, increased heat, excessive noise, wear, etc. Potential failure conditions can be detected using relatively simple techniques such as monitoring gauges or measuring brake linings. Additionally, potential failure conditions can be detected by employing more technically involved techniques such as thermography or eddy current, or by using continuous monitoring with devices such as strain gauges and accelerometers installed directly on machinery. The point of On-Condition tasks is that maintenance is performed only upon evidence of need.
In the context of RCM, all proactive maintenance tasks must be technically appropriate and worth doing. Chapter 9 details how to determine if a proactive task is technically appropriate and worth doing.
Step 7: Default Strategies
As mentioned earlier, RCM isn’t just about maintenance. There are a great many solutions other than proactive maintenance that can be derived using the RCM process. Examples include: Failure Finding tasks, Procedural Checks, no scheduled maintenance, and other recommendations such as modifications to operating procedures, updates to technical publications, and equipment redesigns. In the context of RCM, these recommendations are known as Default Strategies. Default Strategies are discussed in detail in Chapter 10.
It is often wrongly believed that FMEA and FMECA are analyses that are accomplished independently of, or in lieu of, RCM. On the contrary, the first four steps of the RCM process produce a FMEA. The steps to accomplish a FMEA are depicted in Figure 1.23.
Figure 1.23 First four steps of the RCM process produce a FMEA
Figure 1.24 First five steps of the RCM process produce a FMECA
Additionally, the first five steps of the RCM process generate a FMECA. The steps to accomplish a FMECA are depicted in Figure 1.24.
When RCM is performed, the requirement for a FMEA and a FMECA is largely satisfied.
RCM is an exciting process that yields overwhelming positive results when the process is applied correctly with the right people. RCM isn’t a new process. The application of its principles spans several decades and has been (and is being) applied in nearly every industry throughout the world. RCM can be carried out swiftly and efficiently when executed properly. Additionally, RCM’s principles are so diverse that they can be applied to any asset such as an airplane, nuclear power plant, manufacturing plant, or an offshore oil platform. RCM principles can be widely applied to an entire asset or more narrowly applied to select pieces of equipment.
After the operating context is drafted, the seven steps of the RCM process are carried out: 1) Functions; 2) Functional Failures; 3) Failure Modes; 4) Failure Effects; 5) Failure Consequences; 6) Proactive Maintenance and Intervals; and 7) Default Strategies. One of the major products of an RCM analysis is the development of a scheduled maintenance program. However, RCM can be used to formulate scores of solutions that reach far beyond maintenance.