A Case Study on Common Issues Caused by Cascading System Failures Essay

“Conference” Paper and Presentation Systems Engineering

Abstraction

We Will Write a Custom Essay Specifically
For You For Only $13.90/page!


order now

Cascading system failures are common topographic point in all countries of industry and can be administrations financially and has in certain instances cost lives. It can be caused by the most simplest of things that is either left unattended, semi-completed or non even noticed and so spirals out of control as it causes knock on effects to other countries of the system ( organizational, computational, physical ) .

The purpose for this paper is to look into a figure of instance surveies where cascading failures have occurred in an effort to happen common issues between them and to find if cascading system failures can ever be prevented or resolved. More significantly how common are cascading system failures compared to individual event failures or are all system failures finally consequences of cascading issues which administrations may or may non hold thought to foretell in their initial hazard appraisals, if any. Additionally is how common are individual points of failures compared to multi-point failures.

1 Introduction

Cascading failures are fortunes where interrelated systems fail in one subdivision and so take to failures in other countries of the system that finally progress to the extent of complete systems failure. Cascading failures exist in many signifiers of natural and semisynthetic systems and can happen as natural catastrophes or as semisynthetic causes ( purposeful or unwilled ) . A simple manner of seeing the effects of cascading system failures and understanding the grounds for the failure is by picturing it in the signifier of Jenga [ 1 ] blocks, see figure 1. The system is supported by four subdivisions which portion the burden at 25 % each doing it a stable system. When one subdivision fails the other three accommodate for the loss and portion the burden which increases each of the lasting subdivisions tonss by an extra 32 % – 36 % but the system still is in a manageable province. If the system is still non rectified and another subdivision fails, either from the overload or merely over clip, the system is placed into a neglecting province as the last two subdivisions are sharing the burden of two failed subdivisions which has now doubled their original burden and frequently leads to a complete prostration in the system when another falls under the force per unit area.

Figure 1. Cascading Failure aka Domino consequence.

2 Cascading Failures

2.1 The Single Point of Failure ( SPOF )

A regular type of cascading failure happening, known as a individual point of failure, turn outing to be about unpredictable, is where a subdivision fails and finally leads to a knock-on consequence, get downing a speedy expanse of the job to many other countries of the system. The individual point of failure could be a really little component of the system but be depended upon extremely by most or all of the chief maps of the system that when it is removed the other systems can non go on to map.

2.1.1 Oregon Power Failure Example

Harmonizing to ( Vassell, 1991 ) an electrical line in Oregon, USA, on the ninth of November 1965 failed and forced a immense failure throughout the western United States and Canada, impacting around 30 million people. This individual and apparently little failure caused all the other neighbouring grids to take on the overload which led to them neglecting and so on. The extra cascading consequence was to consumers who until now had non thought about their complete trust on electricity and the annihilating consequence this had on the public conveyance system coming about to a standstill with no street visible radiations and electric trains entirely reliant on the power. Mistake happening proved about impossible partly due to the country of failure ; see Figure 1, to be covered but chiefly due to the fact that there were no marks of harm or fiddling to indicate out the cause of the job. It took about 13 hours to rectify. Customers were sent into darkness, infirmaries lost power, conveyance about came to standstill and concerns reliant on power ( e.g. infrigidation ) lost green goods and Insurance companies panicked.

The solution although simple in thought proved dearly-won in both clip and fundss. Hazard appraisals were performed and the grids were separated by industrial circuit surfs to forestall one mistake from impacting another grid. This besides meant that technicians merely needed to cover smaller geographic countries where failures occur.

Figure 2. America’s Northeast Blackout of 1965.

( Courtesy:hypertext transfer protocol: //blackout.gmu.edu/ )

2.2 The Butterfly Effect

The most common cascading failure comprises of several systems that crash due to the butterfly consequence, aka [ 2 ] the ripple consequence. This is where an seemingly undistinguished happening undulates outward, ensuing in a much larger incident.

Henri Pointcar e ( 2011 ) , a Gallic mathematician, theoretical physicist, applied scientist and philosopher of scientific discipline born in 1854 and the writer ofLa Science et I’hypothese[ 3 ] in 1908 quoted

A little mistake in the former will bring forth an tremendous mistake in the latter.

This led to the well-knownChaos Theory, see figure 3, written by Edward N. Lorenz ( 1995 ) an American mathematician and meteorologist, born in 1917, in his book calledThe Essence of Chaoswhere he claims that the pandemonium theory points out the behavior of specific modus operandis sing gesture ( e.g. ocean currents, population growing, concern growing and maps, etc. ) , adverting little alterations in early conditions that so bring forth drastically different results. He questioned his equals by puting the theoretical inquiry ;

“Predictability: Does the flap of a butterfly ‘s wings in Brazil set off a twister in Texas? ”

Figure 3. The Butterfly Effect

He states that the pandemonium theory does non intend systems are figuratively helter-skelter. Alternatively it depends on the equivocal feature in prosodies, the indirect behavior of seemingly direct systems, and the truth of prognosiss. This merely means that even the best laid programs have variables that can non be predicted even in the most organized systems including a hazard analysis. This is due to alterations in technological promotions, human nature and mistake and the forces of nature ( Acts of the Apostless of God ) .

2.2.1 1974 France Plane Crash Example 1

In 1974 a Turkish rider aircraft going from Ankara to London via Paris crashed in the Ermenonville Forest near Paris. All 345 riders perished in the clang many of which were British taking to an probe by the Department of Trade’s Accident Investigation Branch ( Department of Trade, Accident Investigations Branch, 1976 ) .

The decision of the study states that the cause of the accident was due to the aft lading door on the left-hand side chuck outing ensuing in sudden depressurisation tearing the rider degree floor support, drawing six riders ( and their seats ) out of the plane. This had a ripple consequence to the figure two engine damaging the flight controls and finally the cost of the plane, Flight Crew and all Passengers on-board. However, upon farther probe it was revealed that the wrong battle of the aft door’s locking mechanism before take-off was the initial cause. The design of the door lock mechanism unluckily meant that although the mechanism would shut and look locked, it was in fact merely partly sealed and able to unlock with easiness, see figure 4. It was besides noted in the study that a position port on the door allowed the land crew to look in at the mechanism to guarantee that it was so firmly locked.

Figure 4. The DC-10 Locking Mechanism.

( Courtesy of the Department of Trade )

Additionally, the study states that Service Bulletin 52-37 [ 4 ] criterions were non followed with respects to the locking mechanism alterations and accommodations ensuing in the lockup pins abnormally jutting and the warning visible radiation on the flight deck prematurely turning off. The force per unit area release valves between the lading compartment and the rider deck were unequal with the flight control cables all laid unsuitably between the two subdivisions. The worst component of this study is the find that the Turkish air power company did non go to to warnings 19 months before due to an accident in Windsor, Ontario of American Airlines Flight 96 DC-10 ( same brand and theoretical account plane ) due to a faulty lading door.

2.2.2 World Trade Centre Disaster 11 September 2001 Example 2

Dr. Buyukozturk and Dr. Gunes ( 2004 ) , Civil and Environmental Engineers for the Massachusetts Institude for Technology, attended a conference in Istanbul, Turkey, in 2004. The subject of treatment was the autumn of the World Trade Centre ( aka. The Twin Towers ) in New York, USA, on 11 September 2001. This catastrophe, although man-made had disasterous effects which caused worldwide jobs ensuing in multi-national provinces of fiscal depression. They discussed the causes, effects and lessons learnt from this tragic matter.

The universe trade Centre towers comprised of multiple concerns including the New York Port Authority, jurisprudence houses, fiscal companies, province offices and many others. On the 11 September 2001 two Passenger Planes flew head-on into the two towers and in less than two hours both edifices were leveled to the land doing structural harm to ten other constructions and concerns environing the towers, see figure 5. 2,753 people lost their lives including 343 fire-fighters, 23 police-officers, 84 Port authorization employees and all riders aboard the two flights. The extent of the local butterfly consequence was that New York’s transit system, exigency services, H2O supply systems, telecomunications and the energy supply all crumbled under the force per unit area. The United States government’s operations were thrown into pandemonium looking for replies to what had happened, who was responsible, what harm was caused, how to incorporate the harm and how far could this job spread before it can be controlled ( if at all ) . The whole universe faced lay waste toing intelligence of the loss of lives and how America was fighting to command the state of affairs. The planetary after effects were ; oil and gas storage and bringings slowed operations and planetary banking and finance fell deeper into recession, delight read the conference study for farther inside informations.

Figure 5. Twin Tower Area of Damage.

( Courtesy ofhypertext transfer protocol: //911research.wtc7.net/ )

The point is that merely known hazards had been analised for hardware and package failures, informations corruptness, telecomunication failure, physical installation jobs and security failure. What had non been considered in any hazard analysis particularly for such a high hazard mark were structural failures, long term power outage, full logistics/transportation failure, human support failure, entire exhaustion of the exigency and deliverance services and finally planetary fiscal losingss, see figure 6.

Figure 6. US Federal Fundss and Crude Oil Rates 2001.

( Courtesy ofhypertext transfer protocol: //www.spiritoftruth.org/)

Dr. Buyukozturk and Dr. Gunes recommended that serious lessons needed to be learnt sing preperedness, strict technology techniques for high hazard taget constructions and an effectual exigency direction and catastrophe recovery programs. Most significantly they said PREPARE FOR THE UNEXPECTED.

3 Are All System Failures Caused by Single Points of Failure?

No! Sometimes a system comprising of multiple complex constituents and dealingss can hold failures organizing in different elements of the system that so converge into one meeting point bring forthing what appears to be a individual point of failure. A simple illustration would be in the signifier a company comprising of different sections within one edifice who have separate issues associating to one undertaking but are non pass oning with each other and hence the other sections are none the wiser to the multiple failures organizing. The undertaking is so put together and the failures become evident when the finalization of the overall undertaking is required ( e.g. Administration section has altered stock codifications but non saved the old codifications as mention and the gross revenues section has sold a merchandise utilizing the old codifications alternatively of corroborating the codifications in usage with current stock degrees, so when logistics come to execute the “pick and package” process they are unable to find which merchandise to pick as the codifications do non fit anything they have due to the codification differences ) .

4 Are All System Failures Cascading?

No! Although most instances of system failure tend to hold a cascading consequence it is non the instance in all scenarios. Sometimes a system can hold a system failure that does non hold knock-on effects to anything else. For illustration, if a retail shop’s recognition card installation fails this does non hold all operations as they can still take on hard currency gross revenues, but if the hard currency registry installations malfunction including the recognition card installations so the system may really good neglect wholly due to the inability to do minutess. This would be a instance where a hazard analysis would be required and the prosodies could be calculated for such a scenario and a catastrophe recovery program can be set in topographic point.

The most common scenario that poses high chance of taking to cascading failures is where a system comprises of complex multicomponent relationships that rely to a great extent on each other to carry through an aim ( hypertext transfer protocol: //www.maths.bristol.ac.uk/ , 2010 ) .

A recognizable illustration of a complex system is the human biological science. Harmonizing to Sachs ( 2013 ) decease is a cascading failure of a figure of important co-dependent complex systems to the point that resurgence is statistically hopeless. Depending on the cause of decease is where the cascading failure starts doing all of the separate elements of the system to malfunction or cease wholly until the full system stops wholly, ensuing in decease. Death normally happens in the order of bosom or take a breathing forestalling oxygenation to dependent elements of the organic structure. This typically affects the encephalon foremost, doing the individual to go unconscious sometime within 10 seconds of being deprived. Due to the encephalons dependence on oxygen the harm is lasting after merely a few proceedingss of being deprived. The loss of O continues to do farther harm to other parts of the organic system, finally ensuing in complete shut-down, but the clip can change depending on temperature.

5 Decision

Failures in big systems are really common ( e.g. traffic jams, fiscal market prostrations, a individual flicker get downing a big graduated table forest fire ) . Large graduated table system failures are most often a straightforward effect of what is known as a Byzantine [ 5 ] failure. Where one subdivision of a system clangs in an uncommon manner and normally continues to run while harming other subdivisions within the system before wholly fall ining.

Due to such events as Kessler syndrome ( 1978 ) , which covers the effects of the aggregation of infinite dust over clip finally taking to the terminal of infinite geographic expedition and increased high speed meteoritic showers, an country of scientific survey calledComplexity Science( Downey, 2012 ) has been created to detect the causes of cascading failures in an effort to forestall, or at least prepare ( i.e. hazard analysis ) , them from go oning. The prosodies are excessively complex to cover in this paper but it is used to seek and find the chances of any predictable issues that could originate and so take to cascading catastrophes. These types of events expose a cardinal defect within all multipart systems as described by thechaos theory. Each subdivision of the system is assumed to execute within a certain range of restraints, but when it drifts beyond that range it fires up a sequence of concatenation events that adjusts the actions of the whole system.

Mentions

[ Department of Trade, Accident Investigations Branch ]

1976.Turkish Airlines DC-10 TC-JAV Report on the Accident in the Ermenonville Forest, France on 3 March 1974,London: Her Majesty ‘s Statonary Office.

[ Downey, A. ]

2012.Think Complexity: Complexity Science and Computational Modeling.s.l. : O’Reilly Media, Incorporated.

[ Dr. Buyukozturk, O. & A ; Dr. Gunes, O ]

2004.The Collapse of Twin Towers: Causes and Effectss.Cambridge, MA, USA, IST Group.

[ hypertext transfer protocol: //www.maths.bristol.ac.uk/ ]

2010. Complex Systems and Their Features.What is a Complex System? ,1 ( 1 ) , p. 32.

[ Kessler, D. J. & A ; Cour-Palais, B. G ]

1978. Collision Frequency of Artificial Satellites: The Creation of a Debris Belt.Journal of Geophysical Research,83 ( A6 ) , p. 2637–2646.

[ Lorenz, E. N ]

1995.The Essence of Chaos.London, England: Taylor and Francis.

[ Poincare, H ]

2011.Science and Method.1914 erectile dysfunction. Mineola, New York: Courier Dover Publications.

[ Sachs, J. S ]

2013.Cadaver: Nature, Forensics, and the Struggle to Pinpoint Time of Death.s.l. : Basic Books

[ Vassell, G. S ]

1991.Northeast Blackout of 1965,Volume 11, p. 4.

4774655Page 1