GitOps

There’s an exciting practice growing with the world of Kubernetes and containerization. It’s GitOps. If you don’t know much about it, read on!

Background

While microservices are cool and automation is also cool, it does introduce new challenges. For example, when we deployed a couple of services as part of a microservices style architecture, the problem became evident. Recreating a cluster was becoming easier than deploying all the services afterwards. Why? Because we needed to clearly understand all the various services and their dependencies that made up the system. Not only that, but we also needed to understand the order we could deploy them. The services were using a typical CI/CD process with proper use of liveness/readiness checks, so if we deployed services without their dependencies their health probes would fail. Also, each service had their own repository and their own helm chart that helped with templating so it took time figuring out if we missed a dependency or a missing config value. In the end, we had to create a pipeline that re-provisioned all the application services and their dependencies so that in the event of a new cluster, we could have the application running quickly.

The GitOps way

GitOps is a simple idea. What if the entire system could be described declaratively? One place so that you or anyone could refer to as the source of truth of the cluster? What if any change to the source of truth was automatically applied?

The way it works is fairly simple. As opposed to having a CI/CD pipeline that pushes new changes, instead have an agent that runs as part of the cluster and watches for changes against a Git repository. This means that the cluster is essentially pulling new changes based on what’s already committed. Therefore, if you want to deploy it, you ought to commit it! The result is a Git repository that mirrors what’s deployed on a cluster across one or more applications or services.

There’s other great benefits too. When you have a dedicated Git repo that mirrors what’s deployed on your cluster, every change that makes it to master will probably be code reviewed through a pull-request. It’s natural and part of the process.

Another benefit that personally excites me, is that restoring service by creating a new cluster and redeploying the stack is much simpler. Once a cluster is re-created, as soon as the agent gets deployed, it would deploy all the manifests to mirror the base repository. This could mean that in the event of a catastrophe, if we needed to recreate the cluster, it could mean re-provisioning the entire infrastructure and the applications in the matter of minutes and not hours.