Out of this world lessons from the Apollo Lunar landings–Part II
No-one starts perfect; it took a decade of hard work for NASA engineers and astronauts to go from nothing to landing a man on the Moon. During that time, many lessons were learned which are relevant to anyone developing applications today. This is the second article in a series (you can see the first part here) which presents some Cloud Service Management & Operations (CSMO) concepts, using space exploration as its theme.
In the first article, we saw an example of NASA’s Mission Control (operations team) performing as a well-oiled machine. In today’s article, we will see an example from the dawn of manned space flight — the first American flight to orbit the Earth — John Glenn’s Friendship 7 in 1962.
Warning — here be minor spoilers for the 2016 film Hidden Figures!
The first American spacecraft in the early 1960s was called Mercury. This “capsule” was so small that the astronauts joked that they didn’t get into the craft; they put it on!
It was cramped in the spacecraft. There was barely enough space for the astronaut to float, let alone share space with a 1960's era computer (these, believe it or not, were even larger than a 2019 flagship cellphone!). The Mercury spacecraft, therefore, had very primitive diagnostic tools.
Mission Control in Houston received a trickle of telemetry data with which it could interpret the status of the spacecraft.
Now, the most important responsibility of any spacecraft is keeping the astronaut inside alive and the heaviest part of Mercury was the massive heat-shield at the bottom. The heat-shield protected the spacecraft (and the astronaut inside) from the over 1,500 °C (3,000 °F) temperatures encountered during re-entry. As can be imagined, a heat-shield needs to be precisely aligned to dissipate the heat properly.
Near the end of the flight, Mission Control received a telemetry signal from the spacecraft to report that the heat shield had deployed prematurely. If true, this meant that the spacecraft might burn up during re-entry.
There was no supplementary information to the monitoring signal. It was a simple alarm light that blinked on.
Glenn had not reported hearing any knocks or feeling any of the vibrations in the spacecraft which were expected to occur should the heat-shield be deployed prematurely. Mission control only had one point of data to use and so they had to operate under the assumption that it was correct, even though they suspected that the monitoring sensor was faulty and did not reflect reality.
In the image to the left you can see a Mercury capsule being tested on the ground. The base of the capsule is pointing to the left. The three braking rockets (the metal “pot” with three spouts) are connected to the spacecraft with very high tech straps.
The straps are connected to the heat-shield (the round base of the capsule).
There was little that could actually be done to solve the problem, and fortunately, it turned out to be a false alarm. There was no fault in the heat-shield, and Glenn returned to Earth without problems.
These early problems and the lessons learned from the Mercury missions enabled NASA to achieve their goal of landing on the Moon. Let’s look at some of these lessons and observe their relevance to modern application development.
Due to the lack of an on-board computer and the weak radio network to communicate with the spacecraft, the engineers had to overcome two limitations inherent to their tools when analyzing the heat-shield problem:
- They had limited capability to measure and examine the spacecraft.
- The measurements they could perform were limited in scope — for example, all they knew was that the heat-shield was out of place. But how far out of place was it? 1 millimeter? 1 inch?
When developing applications today, we are usually less concerned about whether we can safely perform re-entry, but we still want to be able to to measure and examine the health and well-being of our applications (Can we perform a bank transaction? Can we call for a taxi?).
We call the capability to perform this measurement Monitoring.
Just like the engineers wanted to know how far the heat-shield had shifted out of place, we would like to get more diagnostic information out of our applications (is there a delay in performing bank transactions, for example).
The measure of information exposed by an application is its Observability.
So ideally we’d have an environment with great monitoring tools and applications with a high level of observability so that we could diagnose, solve, and avoid problems as much as possible.
For NASA, improving observability would have to wait for more powerful spacecraft such as Apollo, which had computers built-in. But they could improve the basic monitoring of the Mercury capsule by adding two more sensors to the heat shield at different locations, so that a single faulty sensor would no longer raise false alarms¹.
For a modern application, you can use Cloud Availability Monitoring² (CAM) and IBM Cloud Application Performance Management³ (APM) to set up monitoring from multiple locations deployed either in your own datacenter or in IBM data centers (Points of Presence) to analyze the behaviour of your websites from multiple locations. This limits the possibility of receiving a false positive alarm due to a network failure from a single location.
I hope you’ve learned something from the second article of this series. The next article will come out close to the actual anniversary of the Moon landing, so to keep the article positive, we’ll discuss IBM’s role in the construction of the mighty Saturn V rocket and how some of the design choices made for a 111 meter (363 feet) tall rocket are the same as the newest Kubernetes application.
I will be pleased if you would join me.
You can learn more about modern monitoring in my colleague Ray Stoner’s series of articles.
Learn more about developing applications with Observability using Build-to-Manage techniques with the IBM Garage and schedule a no-charge visit.
To prepare for the 50th anniversary, you can watch the new Apollo 11 documentary movie with restored, remastered and unseen archive footage.
[1] Flight: My Life in Mission Control, by Christopher Kraft, 2003. Pg 68
[2] https://cloud.ibm.com/catalog/services/availability-monitoring
[3] https://www.ibm.com/us-en/marketplace/application-performance-management
Other articles in this series:
- Part I: The 1201 program alarm
- Part II: Glenn’s flight
- Part III: Functional vs non-functional requirements
- Part IV: RACI
- Part V: ChatOps
- Part VI: SRE & Transparency
- Part VII: Operational Scorecards
- Part VIII: MVP vs PoC
- Part IX: DevSecOps Quarantine
- Part X: Reliability
- Part XI: Day-2 Operations
- Part XII: Flying to the Moon is like developing on your laptop