Testing and Proofs of Concept — the Shuttle Approach and Landing Tests
Shuttle To SRE — STS Lesson 3
How is the Enterprise related to Site Reliability Engineering?
I write this article 44 years to the day after Space Shuttle Enterprise flew (or rather glided) on a successful test flight and on the day TV star William Shatner, Star Trek’s Captain Kirk, is scheduled to launch on a sub-orbital space flight.
In this article I’ll describe two lessons SREs can learn from the flight, and the naming, of Space Shuttle Enterprise — and how Star Trek’s Starship Enterprise is part of the second lesson.
In developing a new application, it’s pretty much a truism that the development, test and production environments should be as similar as possible — so that we identify as many potential issues as early as possible.
This is so important that it’s defined as one of the 12 factors of cloud native development.
One of the major advantages of Cloud Native development such as Kubernetes over older solutions is the ease of making environments equivalent. Especially when methodologies such as GitOps are part of the solution.
However, when testing the Space Shuttle in the late 1970s, NASA had no way of making the test environment (here on Earth) equivalent to the production environment (up in space). The next best thing was the creation of a prototype Space Shuttle which would be used to test the parts of the space flight which could be tested on Earth. Namely how on Earth do we land?
One of the design goals of the Space Shuttle was to land on a runway like an aeroplane or glider, with a precision of a few meters, unlike the earlier Apollo spacecraft which used parachutes to land in the ocean, often far away from the target area. This meant that the operations around recovering an Apollo spacecraft required the use of multiple Navy ships and thousands of man hours — and this was just to recover the capsule which would never be used again. The Shuttle was designed to have a faster turn-around time, meaning that it would be quickly refurbished and the same craft would be launched into space again. Using the 12-factor app analogy, this would make the Shuttle a more robust (and re-usable) vehicle since it had a much more graceful completion state.
So, when the Space Shuttles were first ordered and constructed, NASA considered the first one (serial number OV-101) to be the prototype spacecraft, upon which they would perform a series of tests to make sure that the craft would function correctly. Further shuttles would be developed based on the lessons learned by OV-101. Many of the expensive design decisions and construction tasks would be performed after the results of the tests were done.
The most publicly dramatic of these tests were the Approach and Landing Tests (ALT). The prototype shuttle would be lofted into the air on top of a modified Boeing 747 jumbo jet and then released to land by itself. It would glide to a landing since it would be simulating a real shuttle landing, which had no fuel by the time it returned from Earth orbit. The tests were gradual with the prototype first flying and landing on top of the Boeing, then gliding while many of the less aerodynamic components were covered, while finally, on October 12th 1977, the shuttle “flew” and landed in its proper configuration for the very first time!
This is equivalent to performing tests which start off as safely as possible and gradually become more and more “production-like” as we gain confidence. In IBM’s Chaos Engineering Principles these concepts are defined as:
- Strive for production : Experiment first upstream in non-production environments. Build confidence by learning about the system behavior during different scenarios and ensuring the system’s ability to gracefully withstand and recover from failures and unexpected events. As you gain confidence, transition towards production.
and
- Increase complexity gradually: Increase complexity gradually while you adjust the granularity of the experiments. Expanding the scope and combining tests can reveal previously unknown weaknesses.
Since OV-101 was the first shuttle to fly and the first shuttle to be considered a vehicle instead of a test article (there were astronauts on board, after all — even if the shuttle stayed within Earth’s atmosphere) there was a lot of publicity around it and Star Trek fans led an enormous letter writing campaign which resulted in the shuttle being named Enterprise.
While the lessons NASA learnt from Enterprise’s ALT flights were invaluable and led to many improvements in the other shuttles which were still under construction, by the time NASA had made all the required changes, the production-ready shuttles were so different from Enterprise that it could not be modified from an atmospheric test shuttle to a space-worthy vehicle. Instead of flying into space, Enterprise was sent to be viewed and to inspire generations of visitors in the Smithsonian Museum, Washington D.C.
The lesson to be learnt here is that while Enterprise was the first shuttle, it was actually a prototype or Proof of Concept (PoC) — it was designed to show that the shuttle could land successfully as a glider. Modifying it after the fact to make it space-worthy may have been possible, but it ended up being cheaper to build the next shuttle a different way.
Though even after retirement Enterprise contained to serve as a test subject — after the 2003 Columbia disaster pieces of Enterprise were used as test articles during the post-incident review and research of the problems.
As an SRE, I keep in mind that the difference between a Proof of Concept (PoC) and a Minimum Viable Product (MVP) is that a PoC is not designed to actually do work in production — it’s here to prove a technical case, not a business case — which is what an MVP does. A good MVP does something well and can be supported in the production environment!
In a way, Enterprise was a victim of its own success. If it had not been considered “the first shuttle” then perhaps the campaign to name a shuttle Enterprise would have been postponed to “the first shuttle to fly in space [the first production shuttle]” and Space Shuttle Enterprise would have flown to the Final Frontier like its television namesake.
I’ll be discussing SRE Lessons 35 years after the Challenger Disaster next week at the IBM PREVAIL2021 conference —an IBM conference devoted IT resilience, performance, security, quality testing, and SRE — so if you’re interested in any of that, please join me and register
Articles in this series:
For future lessons and articles, follow me here as Robert Barron or as @flyingbarron on Twitter and Linkedin.