IBM, IBM Cloud and the IBM Garage are proud to announce the release of two IBM architecture field guides.

These field guides are handy introductions to applying IBM’s agile service management operations methodology to your organization and streamlining your processes. The first guide is an overall view of of the domain of modern reliability operations and the second concentrates on the benefits of artificial intelligence (AI) to help supercharge your work.

IBM Cloud Service Management and Operations (CSMO) Field Guide


Shuttle to SRE —STS Lesson 2

This week, my friends and colleagues in the United States are celebrating their Independence Day and I am reminded of the time in 2006 when I watched the greatest 4th of July fireworks display of all time — the launch of Shuttle Discovery on the mission STS-121.

Shuttle Discovery (flight STS-121) kicks off with fireworks (NASA/Tony Gray)

While the shuttle launched a few hours after dawn’s early light, the rocket’s red glare was visible for miles, and the thunder of the engines was louder than bombs bursting in the air. A most impressive sight which I consider myself fortunate to have experienced.


How They Fit Together to Resolve Incidents

IBM has three synergistic solutions in the AIOps domain: IBM Observability by Instana APM, Turbonomic Application Resource Management for IBM Cloud Paks and IBM Cloud Pak® for Watson AIOps.

IBM’s Observability and AI Operations Solutions Working Together

While this article will describe some of their capabilities in incident management and incident resolution (i.e., detecting issues as early as possible and resolving them as quickly and as efficiently as possible), it’s important to remember that each of these three solutions have additional strengths and capabilities.

For example, Instana can perform deep dives into application performance during the development cycle, Turbonomic can deliver application-driven cloud optimization and IBM Cloud Pak…


A look at managing an incident without AIOps and then with AIOps.

Incident Management is the practice of restoring a damaged service as quickly as possible. In this post, I’ll describe Incident Management process twice — at first without AIOps and then with AIOps — in order to show the differences and advantages of AIOps.

While many of the concepts are global and generic, the techniques and use cases displayed were developed at IBM as part of the IBM Cloud Pak® for Watson AIOps product.

The process will flow along a timeline from left to right.

Without AIOps: Before the incident occurs


Michael Collins, Apollo 11 astronaut. 1930–2021.

In July of 1969 two astronauts looked up at the sky from Tranquility Base on the Moon. Over 3.5 billion people lived on the Earth and when the astronauts gazed at the bright blue sphere, they saw the whole of humanity suspended in velvet blackness.

All of humanity, except for one man — Michael Collins — pilot of the Apollo 11 spacecraft Columbia which was orbiting the Moon, waiting for Neil Armstrong and Buzz Aldrin to complete the first exploration of the Moon and to carry them back to the safety of the home planet. But half the time Collins…


Shuttle To SRE — STS Lesson 1

April 12th is a landmark date in space exploration — Yuri Gagarin became the first human to explore outer space in 1961 and Space Shuttle Columbia, the first reusable space craft, was launched on this day in 1981.
While the technology which launched Gagarin into space has a legacy which continues to this day, the Space Shuttle was a departure from space systems which came before it, and its DNA does not seem to have been passed on to further systems.

In this series of lessons we will discuss why this was and what modern developers, DevOps practitioners and Site…


Glynn Lunney “Black Flight”, 1936–2021

Glynn Lunney, one of the most senior engineers of the Apollo project passed away on Friday, 20th of March 2021. Lunney was a leader and an inspiration to reliability engineers for over 50 years.

Glynn Lunney (NASA)

Glynn Lunney, call sign “Black Flight”, was one of the first of NASA’s flight directors, the engineers who orchestrated the space flights. While the astronauts flew, the flight directors had the responsibility for the overall success of the mission — overseeing everything from the moment the rocket lifted off until the astronauts were recovered by the navy. In modern IT parlance…


Lesson XVIII from the lunar landings

The most famous words spoken on the Moon begin, “That’s one small step…”, but close behind are “Houston, Tranquility Base here. The Eagle has landed.”

For Neil Armstrong, astronaut and engineer, the actual landing of the spacecraft was personally of more importance and historical relevance than his first steps. After all, he was first and foremost a pilot…

Pilots take no special joy in walking: pilots like flying. Pilots generally take pride in a good landing, not in getting out of the vehicle.
— Neil Armstrong

While “one small step” was directed at all…


2020 was certainly a year to remember!

I started writing these articles just before the 50th anniversary of the Apollo 11 Moon Landing and I’ve found myself continuing to share more and more stories which show how modern day Site Reliability Engineers, Sysadmins, Operators and DevOps Engineers can learn valuable lessons from the actions and practices of NASA’s astronauts and flight controllers as they took humanity to the Moon.

Disregarding global events and concentrating on my series of Lunar Landing Lessons, in 2020 I managed to:

  • Publish nine lesson articles. This is less than the rate of one a month…


Lesson XVII from the lunar landings

The word Monolithic would seem to be a definition for the Saturn V rocket. 110 meters (330 feet) tall, filled to the brim with liquid oxygen, liquid hydrogen, and kerosene.

Especially in DevOps, the word monolithic has negative connotations — unwieldy, cumbersome, and difficult to change. But the Saturn V rocket, and the Apollo program of which it was part, was actually a more flexible solution than one might expect. Indeed, the Apollo project was the opposite; it was frankly Agile to an extent that would make modern-day space entrepreneurs such as Elon Musk and Richard Branson envious!

Operational agility…

Robert Barron

Lessons from the Lunar Landing, Shuttle to SRE | AIOps, ChatOps, DevOps and other Ops | IBMer, opinions are my own

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store