By Sandy Dunn, (2003)
Director, Assetivity Pty Ltd
In his seminal book on Reliability Centred Maintenance, John Moubray suggested that, to date, there have been three distinct “generations” of maintenance. The first is characterized by a focus on repair tasks, the second by a focus on improving maintenance planning and scheduling, and the third by a focus on predicting, preventing and avoiding the consequences of equipment failures.
This paper discusses the major issues that may shape the nature of the “fourth” generation of maintenance and argues that the fourth generation of maintenance will focus on failure elimination, rather than prediction or prevention. This will involve two major improvement thrusts - the first involves an expansion of the technical focus of maintenance managers and professionals in areas such as equipment selection and design, and the second involves failure elimination through the more effective application of organizational, systemic, and cultural controls - requiring greater understanding and appreciation of the “soft” skills.
Maintenance, Strategy, Improvement, Failure Elimination, Root Cause Analysis, Equipment Design, Equipment Selection, Design for Maintainability, Reliability Improvement
In his seminal book on Reliability Centred Maintenance, John Moubray suggested that, to date, there have been three distinct “generations” of maintenance. These generations were characterized by changes in three areas:
• Changing Expectations of Maintenance
• Changing Views on Equipment Failure, and
• Changing Maintenance Techniques
The First Generation of Maintenance
Moubray suggested that the first generation of maintenance, before and up to the Second World War, could be described in the following terms:
Expectations of Maintenance
• Fix equipment when it breaks Views on Equipment Failure
• All equipment “wears out”
Maintenance Techniques
• Fundamental Repair Skills
The Second Generation of Maintenance
Moubray suggested that the second generation of maintenance, after the Second World War, and up to the 1970s could be described in the following terms:
Expectations of Maintenance
• Higher equipment availability
• Longer equipment life
• Lower Maintenance Costs Views on Equipment Failure
• All equipment complies with the “bath-tub” curve Maintenance Techniques
• Scheduled Overhauls
• Systems for planning and controlling work (PERT, Gantt, etc.)
• Big, slow computers
The Third Generation of Maintenance
Moubray suggested that the second generation of maintenance since the 1970s could be described in the following terms:
Expectations of Maintenance
• Higher equipment availability
• Higher equipment reliability
• Greater safety
• No environmental damage
• Better Product Quality
• Longer Equipment Life
• Greater Cost-Effectiveness Views on Equipment Failure
• There are 6 failures patterns, following the research of Nowlan and Heap Maintenance Techniques
• Condition Monitoring
• Design for Maintainability and Reliability
• Hazard Studies
• Failure Modes and Effects Analysis (FMEA, FMECA)
• Small, fast computers
• Expert Systems
• Teamwork and Empowerment
Moubray went on to suggest that the way forward for organizations wishing to embrace the third generation of maintenance was to adopt Reliability Centred Maintenance, an approach which addressed the major issues facing Maintenance professionals, incorporated the latest thinking in terms of equipment failure patterns, and which incorporated and integrated many of the maintenance improvement techniques around at that time:
Many organizations have taken Moubray’s advice - some successfully, and others not so successfully. But the fundamental question remains, for those organizations that have now adopted third-generation maintenance techniques, where now? This paper attempts to address this question.
The Fourth Generation of Maintenance
Following Moubray’s framework, we will first discuss each of the influencing factors that may shape the fourth generation of maintenance, before going on to discuss a possible shape and form to the fourth generation.
In the fifteen years since Moubray first wrote of the third generation of maintenance, have the expectations on maintenance lessened? Few would argue that this is the case. Indeed, many, if not all, of the expectations that Moubray outlined, are still present in most organizations. However, the relative emphasis on these expectations is shifting, and as a result, some of the limitations of existing maintenance improvement techniques are being exposed.
Let’s examine the current expectations of maintenance in more detail.
In the last fifteen years, have the expectations of maintenance increased or decreased? Most maintenance managers would be of the view that, at the very least, there has been no decrease in expectations placed on Maintenance Managers over the last 15 years.
In terms of Moubray’s list from the third generation of maintenance:
Equipment Availability - Equipment Availability today is just as likely to be an important performance measure as it was 15 years ago, and the expectation that maintenance will deliver high equipment availability has not abated.
Equipment Reliability - Equally, today equipment reliability is also an important performance measure for maintenance -perhaps even more so than 15 years ago. It is my perception that there is now a far greater awareness amongst operations and maintenance personnel of the impact that low levels of equipment reliability can have on continuous manufacturing processes - even when equipment availability is relatively high. I would argue that the expectations of maintenance in this area have increased significantly over the last 15 years.
Greater Safety - Once again, this remains an important expectation of maintenance - particularly in the sense of maintenance ensuring that equipment can be operated safely. Typically, this has focused on high frequency, low consequence events - but this has more recently expanded to include low frequency, high consequence events - the events resulting in industrial catastrophes and disasters. However, there is also a growing realization that the management systems and processes that are required to effectively avoid, or mitigate the effects of, these catastrophic events are different from those typically used to manage more frequent, relatively minor safety incidents. We will discuss this point in more detail later in this paper.
No environmental damage - overall, there has been increasing focus, in the relevant industries, on minimizing the environmental impact of operations. There is an undercurrent of popular opinion that is pushing for more stringent, and more strictly enforced environmental standards and regulations. In Western Australia, for example, the Environmental Protection Agency has been forced to undergo something of a transformation, as it was perceived to be ineffective in ensuring compliance with its own regulations. There is little doubt that equipment reliability has a large part to play in ensuring compliance with environmental standards and regulations, and there is an ever-increasing expectation that maintenance will ensure that equipment permits compliance with these standards and regulations.
Better Product Quality - in a global marketplace, the requirement to ensure that the product produced meets all quality specifications remains a key focus. For organizations operating in the commodity market (such as most mining, oil and gas, and others), product quality is one of the few ways in which these organizations can differentiate their product from other suppliers. Suppliers continue to be more stringent in specifying the quality of the product that they can accept, and maintenance must continue to ensure that equipment produces a product that meets all required quality specifications.
Longer Equipment Life - perhaps it is my perception, but the ever-increasing pace of technological change and decreasing product life cycles have perhaps diminished the focus on increasing equipment life - at least as far as the maintenance function is concerned. Certainly, there is a continuing expectation that equipment will not be scrapped “prematurely” - that certain basic asset care activities are performed, but I do not see longer equipment life as being high on the list of most maintenance managers’ priorities, at present.
Greater Cost Effectiveness - Moubray used this term to differentiate from the second generation of maintenance, where the focus was simply on reducing maintenance costs. His view was that, in the third generation of maintenance, organisation seek to optimize their expenditure on maintenance, to ensure that the correct level of expenditure on maintenance is incurred in order to minimize the total cost to the organisation (including opportunity costs incurred as a result of lost production, etc.). This may be true, in theory, but it would appear that, at least in most capital-intensive industries, the focus is less on optimising maintenance expenditure, and more on achieving “lean” organisations - that is, organisations with the minimum headcount. It is interesting to review the contents of most listed company Annual Reports these days and compare these with the Annual Reports from 10 or more years ago. 10 years ago it was quite common for organisations to proudly display charts in their Annual Reports showing how total employment within their organisations had grown over the years. These days, of course, this would be total stockmarket suicide - so the charts are no longer shown. I would argue that, despite the theoretical advantages of focusing on greater cost effectiveness, the real demand today, even on organisations currently displaying third generation maintenance capabilities, is to minimise head count, and display “lean maintenance” organisation structures.
In addition, there are at least two other demands that I currently see being placed on the maintenance function today, that have increased over the last 15 years or so.
Risk Management - As mentioned previously, there is an increasing focus within organisations on identifying, and managing potential high consequence, low probability events, particularly in those organisation operating in hazardous industries. And Maintenance is seen as being a key participant in this process. In the past, these types of events were seen as being simply an extension of the routine Safety Management or Environmental Management systems in place at most organisations. However there is a growing realisation that these systems are not very good at effectively managing high consequence, low probability events - they are really much more suited to managing high-frequency low probability events.
One of the outcomes of the investigation of the Longford disaster was the realization that, despite the existence of a world-class Safety Management system at Esso Longford, this catastrophic event still occurred. This, in turn, led the Victorian State Government to introduce separate legislation to cover designated major hazardous facilities, requiring them to demonstrate that they had appropriate management systems and processes in place to guard against the possible consequences of low probability, high consequence events.
Further, numerous authorities in the field of risk management, specifically working in the areas of high consequence, low probability events, have recognized the limitations of most of the current quantitative risk management processes that are currently in places, such as Quantitative Risk Assessment, Probabalistic Safety Assessments, and others. Indeed, Evan and Manion[1] have noted, in particular, that the problems associated with these types of approaches include:
• Failure to identify all potential risk factors
• Problems with uncertainties in the modeling of systems - specifically the problems in obtaining realistic probability data for low-frequency events
• Problems with determining cause-effect relationships - these are often not demonstrable
• Uncertainties due to human factors - these often cannot be modeled, and therefore are rarely anticipatable.
• Problems of complexity and coupling - tight coupling and interactive complexity between system components disallow any complete modeling of potential system failures.
• The Value of Life problem - the fundamental moral problem in assigning a monetary value to human life
Fundamentally, according to Bougumil[2] the most significant problem is that probabilities assigned to individual failure modes are, by and large, conjectural, and based on analyses that cannot be corroborated by experimental testing. This is especially true of uncertainties that arise due to hidden or unknown cause-effect relationships.
While this is particularly true of the quantitative risk assessment approaches, it is also equally true of those approaches that do not formally rely on statistics, but instead rely on intuitive risk assessment approaches. Fundamentally, as human beings, we tend to underestimate the risks associated with high consequence, low probability events.
So how do we overcome this weakness? So called “High-Reliability Organisations”, have cultivated a high level of risk awareness, and this is embedded within their organizational culture. Weick and Sutcliffe[3], in their studies of High-Reliability Organisations (HROs), considered that the key cultural components of HROs were:
• A Preoccupation with Failure - HROs treat any lapse as a symptom that something is wrong with their system - something that could have severe consequences if numerous small lapses were to coincide at some point in time.
• A reluctance to simplify interpretations - knowing that the real world is complex and unpredictable, HROs take deliberate steps to uncover complexity, to see and learn more.
• Sensitivity to operations - HROs are attentive to the front line, where work gets done, ensure that front line operators are situationally aware and encourage people at all levels to speak up when something is “not right”.
• Commitment to resilience - HROs develop capabilities to bounce back from the inevitable errors that will occur, and
• Deference to expertise - decisions are made at the front line, and authority migrates to the person with the most expertise, no matter where, or at what level, within the organization that expertise may lie.
Reason[4], Latino and Latino[5] and others, have noted that organisations defences against high consequence, low probability events are many layered, and consist of various devices, systems and procedures that are intended to serve one or more of the following functions:
• To create understanding and awareness of the risks
• To give clear guidance on how to operate in such a manner as to avoid the risks
• To provide alarms and warnings when danger is imminent
• To restore the system to a safe state in an abnormal situation
• To take over from a failed system in an abnormal situation
• To interpose safety barriers between the hazards and the potential losses
• To contain and eliminate the hazards should they escape the barrier, and
• To provide a means of escape and rescue should hazard containment fail.
These defences could also be:
• Hard Defences
o |
Automated Safety Features |
o |
Physical Barriers |
o |
Alarms and Annunciators |
o |
Interlocks |
o |
PPE |
o |
Fusible devices |
o |
Etc. |
• Soft Defences
o |
Legislation |
o |
Rules and Procedures |
o |
Maintenance Programs |
o |
Training |
o |
Drills and Briefings |
o |
Administrative Controls, such as permit-to-work systems |
o |
Licensing |
o |
Supervision |
o |
Etc. |
Because the defences are many-layered, it generally means that no single failure will result in a catastrophic failure - rather, any “holes” in these defensive layers must simultaneously line up at the same time as some initiating event, before a catastrophic event occurs. Unfortunately, as many of you will know, the failure of these defensive layers are generally “hidden” (using RCM terminology), or “latent”(using Reason’s terminology) failures, and therefore must be constantly guarded against.
The key point about all of the above discussion regarding the effective management of high consequence, low probability events, is that, from a maintenance perspective, establishing effective defences against high consequence, low probability events requires more than the simple application of a single risk management tool, such as RCM, PMO, QRA, PSA or whatever other TLA takes your fancy. Successful defense requires the establishment and maintenance of a risk-aware, reliability-focused organisational culture, and this has more to do with the effective management of people than it has to do with the analytical tool that you use for assessing risks.
Almost any professional person who has been working in maintenance for the last 5 or 10 years would be familiar with the results of the pioneering research conducted by Nowlan and Heap of United Airlines during the 1960s and 1970s, which culminated in the publication, in 1978, of their paper for the US Department of Defense titled “Reliability Centered Maintenance”.
Nowlan and Heap monitored the failure of several hundred mechanical, structural and electrical components over several years, and determined that the conditional probability of failure of these components as they became older could be characterized by one of six failure patterns, as illustrated below. Further, their research indicated the proportion of components that they examined which fell into each of the six categories. This is also included in the diagram below.
Nowlan and Heap (and Moubray) have used these findings to warn of the dangers of indiscriminately applying fixed interval change-out techniques for equipment and components. For Failure Patterns A, B, and C, they argue, as the conditional probability of failure increases with age, there becomes a point at which it is technically feasible to replace the component, and subsequently reduce the overall probability of component failure. For components displaying Failure Pattern E, however, replacing the component does nothing to improve reliability, as the new component is just as likely to fail as the one being replaced, and if the component displays Failure Pattern F, then replacing the component on a fixed interval basis actually reduces overall reliability and increases the probability of failure, by reintroducing burn-in into a previously stable system.
This is valid, as far as it goes, but it ignores two important aspects of the above charts.
First, it is important to note that according to the above charts, significantly more than 50% of components experience early-life failures. This means that, whenever we replace or repair a component, there is more than a 50% chance that it will fail early in its life. Another study or Nuclear Power operations, conducted by the Institute of Nuclear Power Operations in the US, and the Central Research Institute for the Electrical Power Industry in Japan also indicated that more than half of identified nuclear power plant performance problems were associated with maintenance, testing and calibration activities (quoted in Reason, p 92). In 2003, is this acceptable? I would argue strongly that it is not.
Further, do we really understand why the proportion of components displaying this failure pattern is so high? Do we understand the causes of this failure pattern? I am not aware of any formal research into this phenomenon, but some possible explanations that I have heard include:
• “Human Error” - the repair/replace task was not successfully completed due to a lack of knowledge or skill on the part of the person performing the repair.
• “System Error” - the equipment was returned to service after a high-risk maintenance task without the repair having been properly inspected/tested.
• “Design Error” - the capability of the component being replaced is too close to the performance expected of it, and therefore lower capability (quality) parts fail during periods of high-performance demand. The remaining higher capability (quality) parts are capable of withstanding all performance demands placed on them. This could be envisaged in the following graph:
• “Parts Error” - the incorrect part or an inferior quality part has been supplied.
Reason (ibid, p. 95) analysed the reports of 122 maintenance lapses occurring within a major airline over a 3 year period, and determined that over 56% of lapses involved omission errors - these included such items as fastenings left undone or incomplete, items being left in a “locked” condition, caps loose or missing, items left loose or disconnected, and others.
Clearly, in the fourth generation of maintenance we need to successfully reduce the proportion of maintenance repair tasks that result in Failure Pattern F type probability distributions.
The second point regarding our knowledge of equipment failures, is that while Nowlan and Heap paid attention to the distribution patterns, there is also an opportunity to reduce the overall probability levels of those patterns. In other words, rather than simply predicting or preventing failures, we should also be proactively seeking to eliminate those failures. In the fourth generation of maintenance, perhaps the over-riding goal of every maintenance manager should be to make the maintenance department (and his own position) redundant!
This involves proactively eliminating failure causes, which, in turn, implies a knowledge of those failure causes. Tools such as Root Cause Analysis assist in identifying and eliminating failure causes, but this is largely used as a reactive tool, rather than a proactive tool.
Proactive elimination of failure causes will require the application of tools and methodologies which will:
• Ensure that equipment is procured or designed in such a way as to ensure that it is “fit for purpose”, and has been designed or selected based on proper consideration of the maintenance requirements for that equipment, its maintainability, and life cycle costs, rather than simply based on a requirement to minimise initial capital costs (or at the very least, to ensure that the total capital project comes in under budget). This will require a level of interaction between Engineering and Maintenance personnel that is far greater than is currently in place at most organisations
• Ensure that equipment is operated within its design limitations. This means ensuring that every time the production process is “tweaked” to give greater output, or better quality, or lower costs, that there is proper proactive consideration of the impact of these “tweaks” on equipment reliability.
Further, it will also require a far greater level of discipline amongst most production personnel in adhering to standard, documented, operating procedures.
• Ensure that adequate spare parts management processes are in place to ensure that the right spare parts are obtained and that they are adequately cared for when in transit, as well as when they are stored within the Maintenance Store. This will require a greater degree of sophistication and knowledge than is currently in place in most organisations’ supply functions.
• Ensure that maintenance repair procedures which ensure that the equipment is repaired correctly the first time, every time, are in place, and that these are rigidly adhered to. Once again, this will require a far greater level of attention to detail, and far greater discipline than is in place at most organisations. Appropriate repair quality standards may need to be established. These could include such standards as alignment standards, vibration standards etc., where it can be demonstrated that failure to achieve these standards following completion of a repair tasks will increase the likelihood or frequency of equipment failure.
Moubray listed an impressive number of tools and techniques that were available to maintenance managers. In the consulting boom of the 1990s, the number of tools and techniques has exploded exponentially. In addition, to the many tools and techniques that he listed, there can be added techniques such as:
• Root Cause Analysis (RCA)
• Precision Maintenance
• Zero Based Budgeting (ZBB)
• Total Productive Maintenance (TPM)
• PM Optimisation (PMO)
• Reliability Centred Maintenance (RCM)
• Mission Directed Workteams (MDW)
• Reliability Modelling
• Spare Parts Optimisation
• Outsourcing
• ERP systems
• Pocket PCs
• Wireless LANs
• World Wide Web (WWW)
• Etc.
In addition to these, many organisations have developed their own hybrid improvement approaches, and given these buzz phrase titles of their own: Profit Improvement Programs (PIPs), Profit Enhancement Programs (PEPs) and others abound. It is truly alphabet soup out there!
It is up to today’s Maintenance Manager to make sense of these tools and techniques, and develop an approach that will work for them, in their organisation - an approach that successfully tackles the most pressing issues that they face, and which will result in true bottom-line improvement for their organisations.
So what shape may this fourth-generation approach take, for those organisations that are comfortably operating within the third generation of maintenance?
Based on the preceding discussion, it is likely that the fourth generation approach will primarily focus on failure elimination, rather than failure prediction or prevention. It will increasingly focus on being proactive, rather than reactive.
Specifically, Fourth Generation maintenance improvement activities will concentrate on reducing the proportion of equipment failures that comply with Nowlan and Heap’s Failure Pattern F. It will also focus on reducing the overall levels of failure probability.
In achieving this, maintenance managers will require an expansion of the traditional technical focus of maintenance managers and professionals into areas such as equipment selection and design. But it will also require more effective understanding of the application of organisational, systemic, and cultural controls to eliminate equipment failures. This, in turn will require increasing understanding and appreciation of the “soft” people related skills. Finally, to achieve the aims of failure elimination, there must be a greater level of cooperation and teamwork between Maintenance and Production, Engineering and Supply, which will also require application of softer, people skills.
[1] William M. Evan & Mark Manion, “Minding the Machines - Preventing Technological Disasters - Prentice Hall, 1st Edition, 2002, ISBN 0130656461.
[2] R.J. Bougumil, “Limitations of probabilistic assessment”, IEEE Technology and Society Magazine, v.24, No.8 pp 24-27
[3] Karl E.Weick & Kathleen M. Sutcliffe, “Managing the Unexpected - Assuring High Performance in an Age of Complexity”, Jossey-Bass, 2001, ISBN 0787956279.
[4] James Reason, “Managing the Risks of Organisational Accidents”, Ashgate, 2001, ISBN 184014015
[5] Robert J. Latino & Kenneth C. Latino, “Root Cause Analysis - Improving Performance for Bottom- Line Results”, CRC Press, 1999, ISBN 0849307732
Source: http://www.plant-maintenance.com/articles/4th_Generation_Maintenance.pdf
Comments
Post a Comment