Chaos Engineering on IBM Cloud

Robert Barron
6 min readOct 7, 2020

--

Using chaos engineering to validate your resilient infrastructure and applications.

This guest article was written by Rajesh Jaluka, David Nguyen & Haytham Elkhoja

While we’ve discussed the need for chaos engineering and many of the concepts and principles behind it, it’s also important to understand by performing chaos engineering experiments yourself.

This tutorial shows you how to get started to incorporate chaos engineering using Gremlin, a chaos engineering platform, to validate the resiliency and reliability of your application and infrastructure on IBM Cloud Kubernetes Service or Red Hat OpenShift on IBM Cloud. For more info on Gremlin and IBM Cloud, click here.

Visit Gremlin for their offerings, terms, and pricing.

Objective

  • Install Gremlin on your IBM Cloud container platforms.
  • Experiment with chaos engineering

Services Used

  • IBM Cloud Kubernetes Service or Red Hat OpenShift on IBM Cloud
  • Gremlin

Before you begin

This tutorial assumes you have an existing IBM Cloud Kubernetes Services cluster or Red Hat OpenShift on IBM Cloud cluster. If you don’t have an existing environment, then use the following tutorials to build one:

  • For clusters on IBM Cloud Kubernetes Service, please visit here.
  • For clusters on Red Hat OpenShift on IBM Cloud, please visit here.
Cat, testing your cluster

Create an account with Grelim

Step 1 — Register with Gremlin

  1. Go to Gremlin and select your service.
  2. Login to your account.

Step 2 — Download the Gremlin certificates

  1. 1. Return to the Gremlin dashboard and copy team id, ­­­­­ click Account > Team Settings > Configuration > Team ID > Copy ID
  2. Create an environment variable in your desktop’s shell to store your Gremlin store id
    export GREMLIN_TEAM_ID=”<gremlin_team_ID>”

Step 3 — Download the Gremlin certificates to your desktop

  1. In the Gremlin dashboard, click Account> Team Settings > Configuration > Certificates > Download
    Note: You will need team manager access role.
  2. Locate, store, and rename the certificate files to as following somewhere accessible by your desktop’s shell:
    i. <team-name>.priv_key.pem to gremlin.key
    ii. <team-name>.pub_cert.pem to gremlin.cert

Option A — Deployment steps for IBM Cloud Kubernetes Service

Deploying Gremlin on IBM Cloud Kubernetes Service cluster

Gremlin must be deployed on each cluster you wish to attack. In order for your clusters and containers to be targetable, the Gremlin agent must be registered with the Gremlin Control Plane.

  1. From your desktop’s shell, connect to your IBM Cloud Kubernetes Service instance.
  2. Create a namespace for Gremlin.
    kubectl create namespace gremlin
  3. Create a Kubernetes secret for Gremlin
kubectl -n gremlin create secret generic gremlin-team-cert --from-file=/<fullpath>/gremlin.cert --from-file=/<fullpath>/gremlin.key

4. From your IBM Cloud Kubernetes Service cluster console get your cluster ID and export it as an environment variable in your desktop’s shell.

export GREMLIN_CLUSTER_ID=”<your IBM Cloud Kubernetes Service cluster ID>”

5. Deploy Gremlin on IBM Cloud Kubernetes Service using Helm

helm install gremlin gremlin-beta/gremlin \
--namespace gremlin \
--set runtime.name=containerd \
--set gremlin.hostPID=true \
--set gremlin.usePodSecurityPolicy=false \
--set gremlin.secret.managed=true \
--set gremlin.secret.teamID=$GREMLIN_TEAM_ID \
--set gremlin.secret.clusterID==$GREMLIN_CLUSTER_ID \
--set-file gremlin.secret.certificate=gremlin.cert \
--set-file gremlin.secret.key=gremlin.key \
--set gremlin.apparmor=unconfined

If successfully installed, you will be able to view your clients on Gremlin’s dashboard by accessing Gremlin Dashboard > Clients

Option B — Deployment steps for Red Hat OpenShift on IBM Cloud

Deploying Gremlin on your Red Hat OpenShift on IBM Cloud cluster on IBM Cloud

NOTE: This guide assumes that you will be installing Gremlin into its own namespace. You can start a new project and namespace with the following command. All subsequent oc create commands in this guide leave out the — namespace argument, assuming that you wish to install Gremlin in the current OpenShift project.

  1. From your desktop’s shell, connect to your Red Hat OpenShift on IBM Cloud instance.
  2. Create a namespace for Gremlin.
    oc new-project gremlin
  3. From your Red Hat OpenShift on IBM Cloud cluster console get your cluster ID and export it as an environment variable in your desktop’s shell.
    export GREMLIN_CLUSTER_ID=”<your Red Hat OpenShift on IBM Cloud cluster ID>”
  4. Create a Kubernetes secret containing the key-pair and your team ID
oc create secret generic gremlin-secret \
--from-literal=GREMLIN_TEAM_ID=$GREMLIN_TEAM_ID \
--from-literal=GREMLIN_CLUSTER_ID=$GREMLIN_CLUSTER_ID \
--from-file=gremlin.cert=$PATH_TO_CERTIFICATE \
--from-file=gremlin.key=$PATH_TO_PRIVATE_KEY

5. Set up Gremlin to use the privileged SCC

oc create serviceaccount gremlin -n gremlin
oc adm policy add-scc-to-user privileged -z gremlin -n gremlin

6. Using your favorite editor, create a YAML file gremlin-scc.yaml with the following code:

7. Install the SecurityContextConstraint

oc create -f gremlin-scc.yamloc adm policy add-scc-to-user gremlin -z gremlin

8. Using your editor, create a YAML file gremlin-daemonset.yaml with the following code

9. Install the Daemonset

oc create -f gremlin-daemonset.yaml

10. Using your editor, create a YAML file chao-service-account.yaml with the following code

11. Install Service Account, ClusterRole, and ClusterRoleBinding

oc create -f chao-service-account.yaml

12. Using your editor, create a YAML file chao-deployment.yaml with the following code

12. Create the deployment

oc create -f chao-deployment.yaml

If successfully installed, you will be able to view your clients on Gremlin’s dashboard by accessing Gremlin Dashboard > Clients

Setting up a Gremlin attack

  1. Login to the Gremlin’s Dashboard
  2. In the left menu, click on Attacks to define an attack
Click on Attacks to define an attack

3. Select target, Containers > Choose Containers to target

What type of target do you want to attack?

4. Choose the containers to target, here we’re using the namespace “acme”

Choose the containers to target

5. Select percentage of attack, Containers > Choose Containers to target > Percentage of targets to impact

Choose percent of targets to impact

6. Choose a Gremlin, Category > Resource. Attacks > CPU

Choose a CPU type Gremlin

7. Set the length and CPU Capacity you wish to test then start the attack by clicking on Unleash Gremlin

Set the impact of the attack

Congratulation — you’ve performed your first chaos experiment using Gremlin on IBM’s cloud.

As your chaos engineering experiment is taking place, keep an eye at your system using your existing logs and monitoring tools.

You can also use kubectl get pods --namespace=acme in the case of IBM Cloud Kubernetes Service or oc get pods --namespace=acme to observe the status of the containers being attacked.

You can continue to use Gremlin to validate the underlying platform of your services and you can also easily use it to extend your testing and validation to the applications themselves when they are developed and deployed using IBM’s Cloud Pak solutions.

For further information about IBM’s Chaos Engineering, Principles and Methodology, please follow the following links on our Architecture site: Chaos engineering principles & Use chaos engineering to assess application reliability.

If you’d prefer to listen, please attend (or listen to the recording of) IBM’s Always-On architect Haytham Elkhoja’s session at ChaosConf 2020

You can read more about chaos engineering in a previous blog post, and follow me on Medium/Robert Barron, Twitter or Linkedin for more articles.

Bring your plan to the IBM Garage.
IBM Garage is built for moving faster, working smarter, and innovating in a way that lets you disrupt disruption.

Learn more at www.ibm.com/garage

--

--

Robert Barron

Lessons from the Lunar Landing, Shuttle to SRE | AIOps, ChatOps, DevOps and other Ops | IBMer, opinions are my own