Michael Collins — Carrying The Fire
In July of 1969 two astronauts looked up at the sky from Tranquility Base on the Moon. Over 3.5 billion people lived on the Earth and when the astronauts gazed at the bright blue sphere, they saw the whole of humanity suspended in velvet blackness.
All of humanity, except for one man — Michael Collins — pilot of the Apollo 11 spacecraft Columbia which was orbiting the Moon, waiting for Neil Armstrong and Buzz Aldrin to complete the first exploration of the Moon and to carry them back to the safety of the home planet. But half the time Collins was orbiting the Moon he was behind the far side of the Moon — out of both sight and radio contact. In the decades since they returned to Earth he has been called “the loneliest man in history”. But Collins certainly never felt that way.
While orbiting the Moon, ostensibly isolated from all mankind (Armstrong and Aldrin at least had each other), Collins was kept busy with a series of engineering and scientific tasks. In fact, he often said that he enjoyed the short periods of quiet he had when hidden behind the Moon!
As an astronaut, a test pilot, and an engineer flying less than 200 km above the surface of the Moon, he had no leisure time to feel lonely. Collins had a full flight plan dedicated to scientific work around the Moon and housekeeping tasks — making sure that Columbia was ready to return him, Armstrong and Aldrin back home.
Since he was working alone and without communication with Earth for each half of the orbit he was behind the Moon, Collins was even more dependent on the flight plan documentation he had prepared for himself than Armstrong and Aldrin were. He converted most of his planned tasks into simple step-by-step “cookbooks”
In fact, as Site Reliability Engineers, we always depend on our lists of “cookbooks” — more commonly known as “runbooks” — to save the day when a problem develops and hope that we won’t need them.
Generally speaking, there are three types of runbooks which we use:
- Business as usual/housekeeping runbooks:
Most of Collins’ time around the Moon was spent performing of pre-planned tasks which were either performing scientific experiments, maintaining the spacecraft, or other tasks which advanced the mission such as trying to find where Armstrong and Aldrin had landed with a telescope. These were directly analogous to the runbooks we use in SRE — in fact they served the same purpose as runbooks which we use when things are working properly to achieve our goals — deploying a new version of an application, backing up a filesystem, scaling up a service before a busy period, purging old logs and so on. Today, ideally, they’d be automated and not a source of toil, but in Apollo they were most definitely manual and consumed most of his time.
- Diagnostic runbooks:
Sometimes (all too often) there’s a problem in a system and we need to investigate the source of the problem and identify possible solutions. Since we’re concentrating on solving the problem as fast as possible, this is the worst time to start thinking up new procedures and routines — much better to have a pre-defined runbook which we can follow with the necessary diagnostic checks. If this runbook could be automated, so much the better.
While as much as possible of the space flight was automated, the primitive nature of the onboard computer and its slow (by modern measures) speed meant that Collins was kept busy by running both housekeeping and diagnostic runbooks manually.
- Preventative/Resolving runbooks:
Once we know that we have a problem and what the problem is, we can start working to resolve the problem. The NASA astronauts and engineers spent months training for missions and planning for every eventuality. During these simulations they tested out the effectiveness of their solutions and improved on them if they were not good enough.
During his solitary sojourn around the Moon, a problem developed in a the environmental control system on board Columbia. Flight Controllers on the ground asked Collins to run a series of diagnostic and repair procedures which would find out what went wrong and fix the coolers before they froze part of the spacecraft. Using his close familiarity and test pilot’s instincts, Collins chose to perform a simpler procedure — simply restarting the balky device!
Preventative runbooks are sometimes more difficult to automate, simply because of the wide scope for “things to go wrong”, so any automated response would need to take a much wider range of factors into account than the diagnostic or housekeeping runbooks.
One thing to remember is that resolving a problem might not prevent it recurring in the future. During the flight of Apollo 11 a total of 28 anomalies were reported — problems which ranged from a strange smell in the spacecraft through heater problems all the way to the computer reboot seconds before the landing. All these anomalies were analyzed after the flight and, for most of them, the sources of the problems were found. More importantly, the engineers were able to make sure that these problems would not recur, meaning that time would not be wasted on either diagnostic or preventative runbooks, since the problems were “designed away”.
In this way each Apollo flight was at the same time both more complex than the flights before it (because it attempted to achieve more) and safer than the flights before it (because more was understood about the spacecraft and more problems were preemptively resolved).
Collins was very much aware of the fact that while his compatriots were on the Moon in the fragile Lunar Module Eagle, he was onboard Columbia, the only part of the spacecraft which could actually return to Earth. In addition to his regular training, he also trained for possibility that, due to some disaster on the Moon, he would have to return alone — “a marked man for life” as he wrote in his autobiography. In order to minimise this possibility, Collins and NASA engineers devised 18 different contingency plans for the link up between Eagle and Columbia. These eighteen different runbooks detailed how Columbia would adjust and compensate for any problem which might develop during Eagle’s ascent.
Depending on the problem which developed (Eagle launching too early or too late, Eagle flying too fast, too slow, too high, or too low, Columbia not positioned correctly during launch, and many more) Collins had a well documented series of steps to perform and computer commands to execute in order to get the two spacecraft to link up successfully.
Collins was not only an engineer. In 1974 he wrote his autobiography, Carrying the Fire, which still remains the gold standard in astronaut biographies and he allows himself to unleash his poetic side in it.
He describes the Earth rising over the Moon:
“It pokes its little blue bonnet up over the craggy rim and then, not having been shot at, surges up over the horizon with a rush of unexpected color and motion. It is a welcome sight for several reasons: it is intrinsically beautiful, it contrasts sharply with the [Moon] below, and it is home and voice for us.”
— Michael Collins, Carrying the Fire.
The title “Carrying the Fire” first refers to Prometheus bringing fire (knowledge) to humanity, but for Collins it had additional meanings. It is a reference to the care taken when carrying out a successful space mission. As Collins said “For how does one carry a fire? Carefully.”
As engineers and SREs we can certainly relate to both of these, but it is also a reference to the fact that this technological task was also deeply humanistic — in carrying three astronauts to the Moon, Apollo also carried the emotions, inspirations and aspirations of the entire world.
Michael Collins passed away on the 28th of April 2021, at the age of 90, after a lifetime of carrying the fire.