just a few days after Christmas last year AirAsia flight 8051 traveling to Singapore tragically plummeted into the sea. Indonesia completed its investigation of the crash and just released the final report. Media coverage, especially in Asia is big. The stories are headlined by pilot error but,as technologists, there are lessons to be learned deeper in the report.
The Airbus A320 is a fly-by-wire system implying there are no mechanical linkages between the pilots and the control surfaces. everything is electronic and many of a flight is under automatic control. Unfortunately, this also implies pilots don’t spend much time actually flying a plane, possibly less than a minute, according to one report.
Here’s the scenario laid out by the Indonesian report: A rudder travel limit computer system alarmed four times. The pilots cleared the alarms following normal procedures. After the fifth alarm, the plane rolled beyond 45 degrees, climbed rapidly, stalled, and fell.
Pilot Error?
The media headlines focus on the latter steps in the failure chain, in part because the pilots were never trained to deal with the type of upset that occurred. It wasn’t just AirAsia who omitted this training on the A320. All airlines did because Airbus, the aircraft manufacturer, did not expect the aircraft to ever experience such an extreme upset. note that France, as the host country for Airbus, participated in the investigation.
As technologists we need to look further. The technical root cause was cracked solder joints on circuit boards for the rudder limit control system. This system limits the amount of rudder movement at high speeds. A essential point is this same system failed 23 times in 2014. This was considered minor damage and never fixed.
As in numerous situations, the failure chain is a cascade of human failures to respond correctly to a technical fault. little discussed in many reports is how the pilots attempted to fix the fifth rudder control fault. They followed normal procedures for the first faults but the last time they opened and reset a circuit breaker while in flight. somehow that implied the autothrust and autopilot were disconnected and never restored. This put the pilots solely in control of the plane through the fly-by-wire system.
Tragic Sequence of Events
To summarize, here are the three essential failures:
Bad solder joint,
Cycling the circuit breaker,
Inadequate recovery training.
We’ll disregard the mistake of not properly troubleshooting the board. That is a human failure but also a larger policy issue for AirAsia and not directly technical.
Bad solder joints occur despite best efforts to avoid them in manufacturing. Diagnosing an intermittent joint failure can be a nightmare so we can sympathize with the aircraft maintainers. how ought to we deal with intermittent failures in vital or essential systems? Clearly the system was checking its integrity because it kept issuing warnings throughout 2014. Is it possible to have a system refuse to function if a certain number of failures occur? I’d suggest that after 6 faults it could have a heightened alert, like refusing to boot when powered on in a safe environment (i.e. parked on the ground). essentially the system says, “I know I’m bad, now fix me.”
Aircraft Circuit Breaker
Why did the pilots mess with the circuit breaker? One report says the pilot saw a maintenance worker cycle a circuit breaker to clear a fault. That’s fine on the ground but not in the air. Why would a pilot try this, especially because there are advisories to pilots not to reset circuit breakers unless the system is flight critical? The control system here is a safety feature, but not vital so why not just leave it off?
People in general get overly comfortable with technology because it abounds. There are all kinds of jokes about non-technical relatives doing something crazy to a computer because the same action fixed something else.
Unfortunately, this typically implies people don’t know what they don’t know. In this case, the pilots appeared not to know cycling that breaker would disrupt other systems. Yes, it sounds unusual that would happen and I can’t discuss it because I don’t know why that would happen. If true, it appears to be a systemic problem that ought to be addressed. In our work, we need to make sure that failures in one part of a system do not upset critical parts elsewhere.
The pilots weren’t trained to deal with the flight upset because even Airbus, the aircraft manufacturer, did not expect the aircraft to ever experience such an extreme upset. I guess since Murphy isn’t French they don’t expect his effects to occur there. This assumption probably derived from the aircraft being fly-by-wire. The expectation being the aircraft would not let itself become upset to this degree. but the automatic flight systems were disrupted by the cycling of the circuit breaker.
Τύλιξε
Failures in complex systems takeΠολλή προσπάθεια να εντοπιστεί. Σε αυτή την περίπτωση βλέπουμε πώς τρεις ξεχωριστές ενέργειες προκαλούν την αποτυχία με τέταρτη, την αποτυχία συντήρησης, συμβάλλοντας σημαντικά. Αυτό επισημαίνει ότι η συνολική αποτυχία μπορεί να έχει αποφευχθεί σε πολλές φορές: εάν οι αρθρώσεις συγκόλλησης δεν είχαν αποτύχει. Εάν οι πιλότοι δεν είχαν κυκλοφορήσει τον διακόπτη. Εάν οι πιλότοι είχαν αποκαταστήσει τους αυτόματους υπολογιστές πτήσης. Εάν οι πιλότοι αντέδρασαν σωστά μετά την αναστάτωση.
Ακόμη και ως hackers πρέπει να έχουμε κατά νου πότε και πώς μπορεί να συμβεί βλάβες. Έχουμε γράψει άρθρα σχετικά με τις ηλεκτρονικές κλειδαριές πόρτας που δημιουργήθηκαν από τους χάκερ. Πώς φτάνετε αν η ισχύς σβήσει ή μια κακή κοινή συγκόλληση αποτύχει μετά από μερικές εκατοντάδες ανοίγματα και κλεισίματα; Ας ελπίσουμε ότι ένα ουσιαστικό θα παρακάμψει τα ηλεκτρονικά. Ευτυχώς, πολλά από τα hacks που βλέπουμε δεν είναι κρίσιμα. Ευτυχώς, οι αποτυχίες δεν θα ήταν απειλητικές για τη ζωή. Ας το κρατήσουμε αυτόν τον τρόπο.