Chaos Engineering on IBM Cloud
--
Using chaos engineering to validate your resilient infrastructure and applications.
This guest article was written by Rajesh Jaluka, David Nguyen & Haytham Elkhoja
While we’ve discussed the need for chaos engineering and many of the concepts and principles behind it, it’s also important to understand by performing chaos engineering experiments yourself.
This tutorial shows you how to get started to incorporate chaos engineering using Gremlin, a chaos engineering platform, to validate the resiliency and reliability of your application and infrastructure on IBM Cloud Kubernetes Service or Red Hat OpenShift on IBM Cloud. For more info on Gremlin and IBM Cloud, click here.
Visit Gremlin for their offerings, terms, and pricing.
Objective
- Install Gremlin on your IBM Cloud container platforms.
- Experiment with chaos engineering
Services Used
- IBM Cloud Kubernetes Service or Red Hat OpenShift on IBM Cloud
- Gremlin
Before you begin
This tutorial assumes you have an existing IBM Cloud Kubernetes Services cluster or Red Hat OpenShift on IBM Cloud cluster. If you don’t have an existing environment, then use the following tutorials to build one:
- For clusters on IBM Cloud Kubernetes Service, please visit here.
- For clusters on Red Hat OpenShift on IBM Cloud, please visit here.
Create an account with Grelim
Step 1 — Register with Gremlin
- Go to Gremlin and select your service.
- Login to your account.
Step 2 — Download the Gremlin certificates
- 1. Return to the Gremlin dashboard and copy team id, click Account > Team Settings > Configuration > Team ID > Copy ID
- Create an environment variable in your desktop’s shell to store your Gremlin store id
export GREMLIN_TEAM_ID=”<gremlin_team_ID>”
Step 3 — Download the Gremlin certificates to your desktop
- In the Gremlin dashboard, click Account> Team Settings > Configuration > Certificates > Download
Note: You will need team manager access role. - Locate, store, and rename the certificate files to as following somewhere accessible by your desktop’s shell:
i.<team-name>.priv_key.pem to gremlin.key
ii.<team-name>.pub_cert.pem to gremlin.cert
Option A — Deployment steps for IBM Cloud Kubernetes Service
Deploying Gremlin on IBM Cloud Kubernetes Service cluster
Gremlin must be deployed on each cluster you wish to attack. In order for your clusters and containers to be targetable, the Gremlin agent must be registered with the Gremlin Control Plane.
- From your desktop’s shell, connect to your IBM Cloud Kubernetes Service instance.
- Create a namespace for Gremlin.
kubectl create namespace gremlin
- Create a Kubernetes secret for Gremlin
kubectl -n gremlin create secret generic gremlin-team-cert --from-file=/<fullpath>/gremlin.cert --from-file=/<fullpath>/gremlin.key
4. From your IBM Cloud Kubernetes Service cluster console get your cluster ID and export it as an environment variable in your desktop’s shell.
export GREMLIN_CLUSTER_ID=”<your IBM Cloud Kubernetes Service cluster ID>”
5. Deploy Gremlin on IBM Cloud Kubernetes Service using Helm
helm install gremlin gremlin-beta/gremlin \
--namespace gremlin \
--set runtime.name=containerd \
--set gremlin.hostPID=true \
--set gremlin.usePodSecurityPolicy=false \
--set gremlin.secret.managed=true \
--set gremlin.secret.teamID=$GREMLIN_TEAM_ID \
--set gremlin.secret.clusterID==$GREMLIN_CLUSTER_ID \
--set-file gremlin.secret.certificate=gremlin.cert \
--set-file gremlin.secret.key=gremlin.key \
--set gremlin.apparmor=unconfined
If successfully installed, you will be able to view your clients on Gremlin’s dashboard by accessing Gremlin Dashboard > Clients
Option B — Deployment steps for Red Hat OpenShift on IBM Cloud
Deploying Gremlin on your Red Hat OpenShift on IBM Cloud cluster on IBM Cloud
NOTE: This guide assumes that you will be installing Gremlin into its own namespace. You can start a new project and namespace with the following command. All subsequent oc create commands in this guide leave out the — namespace argument, assuming that you wish to install Gremlin in the current OpenShift project.
- From your desktop’s shell, connect to your Red Hat OpenShift on IBM Cloud instance.
- Create a namespace for Gremlin.
oc new-project gremlin
- From your Red Hat OpenShift on IBM Cloud cluster console get your cluster ID and export it as an environment variable in your desktop’s shell.
export GREMLIN_CLUSTER_ID=”<your Red Hat OpenShift on IBM Cloud cluster ID>”
- Create a Kubernetes secret containing the key-pair and your team ID
oc create secret generic gremlin-secret \
--from-literal=GREMLIN_TEAM_ID=$GREMLIN_TEAM_ID \
--from-literal=GREMLIN_CLUSTER_ID=$GREMLIN_CLUSTER_ID \
--from-file=gremlin.cert=$PATH_TO_CERTIFICATE \
--from-file=gremlin.key=$PATH_TO_PRIVATE_KEY
5. Set up Gremlin to use the privileged SCC
oc create serviceaccount gremlin -n gremlin
oc adm policy add-scc-to-user privileged -z gremlin -n gremlin
6. Using your favorite editor, create a YAML file gremlin-scc.yaml with the following code:
7. Install the SecurityContextConstraint
oc create -f gremlin-scc.yamloc adm policy add-scc-to-user gremlin -z gremlin
8. Using your editor, create a YAML file gremlin-daemonset.yaml with the following code
9. Install the Daemonset
oc create -f gremlin-daemonset.yaml
10. Using your editor, create a YAML file chao-service-account.yaml with the following code
11. Install Service Account, ClusterRole, and ClusterRoleBinding
oc create -f chao-service-account.yaml
12. Using your editor, create a YAML file chao-deployment.yaml with the following code
12. Create the deployment
oc create -f chao-deployment.yaml
If successfully installed, you will be able to view your clients on Gremlin’s dashboard by accessing Gremlin Dashboard > Clients
Setting up a Gremlin attack
- Login to the Gremlin’s Dashboard
- In the left menu, click on Attacks to define an attack
3. Select target, Containers > Choose Containers to target
4. Choose the containers to target, here we’re using the namespace “acme”
5. Select percentage of attack, Containers > Choose Containers to target > Percentage of targets to impact
6. Choose a Gremlin, Category > Resource. Attacks > CPU
7. Set the length and CPU Capacity you wish to test then start the attack by clicking on Unleash Gremlin
Congratulation — you’ve performed your first chaos experiment using Gremlin on IBM’s cloud.
As your chaos engineering experiment is taking place, keep an eye at your system using your existing logs and monitoring tools.
You can also use kubectl get pods --namespace=acme
in the case of IBM Cloud Kubernetes Service or oc get pods --namespace=acme
to observe the status of the containers being attacked.
You can continue to use Gremlin to validate the underlying platform of your services and you can also easily use it to extend your testing and validation to the applications themselves when they are developed and deployed using IBM’s Cloud Pak solutions.
For further information about IBM’s Chaos Engineering, Principles and Methodology, please follow the following links on our Architecture site: Chaos engineering principles & Use chaos engineering to assess application reliability.
If you’d prefer to listen, please attend (or listen to the recording of) IBM’s Always-On architect Haytham Elkhoja’s session at ChaosConf 2020
You can read more about chaos engineering in a previous blog post, and follow me on Medium/Robert Barron, Twitter or Linkedin for more articles.
Bring your plan to the IBM Garage.
IBM Garage is built for moving faster, working smarter, and innovating in a way that lets you disrupt disruption.
Learn more at www.ibm.com/garage