A couple of years back, I endeavored to create a proof of concept for predictive auto-scaling of K8s clusters. At the time (k8s 1.10), there really wasn’t a way to easily and ‘automatedly’ scale a cluster out and back in. Flash forward to today, and I can now leverage the work of the Cluster API project to solve that problem.
Cluster API (CAPI) delivers the capability of a K8s cluster managing the life cycle of other K8s clusters. Bit of a chicken and egg at first glance, but if you utilize a low friction cluster bootstrapper (e.g. Kubeadm) for the first cluster, you can then deploy and manage the rest of your clusters from this first instance. Better yet, we could leverage a cluster that was pre-built for us by a K8s as a service provider.
K8s was born for the need of a better way to manage containers at scale. But it was implemented, and extended, with incredible foresight. A K8s cluster has an api server to receive and respond to basic verb requests (e.g. get, create, delete, etc.), a key-value database to store configuration and intended state of objects, a schema that defines object properties, controllers that do the work of CRUD operations, RBAC controls, and nodes for workloads to consume compute/storage/networking.
While all of the above were initially intended to center around the delivery and life cycle management of containers, the model made it possible to extend K8s capabilities with custom resource definitions (extending the schema of objects the api server recognizes) and custom controllers.
We could define a custom resource definition that represents a smart home light switch and a custom controller that had the code to toggle the switch state. By posting a call to the api server, requesting state=on, we can have a light turned on via a typical k8s workflow. If someone turned it off at the switch, the controller would see this undesired state and turn it back on. In a nutshell, this is how K8s works.
So, if we create a set of custom resource definitions that represent nodes, api servers, and other definitions pertinent to a K8s cluster, and operators that can CRUD those entities, we have the capability to deploy and manage a K8s cluster itself. Hello Cluster-API.
With this, I can now address the programmatic scale out and in of a K8s cluster for the predictive auto-scaler project. I doubt I will have time for it within the next few posts, but hope to have a demonstration and write-up sometime in the coming months. You can read more about the Cluster-API project here. For more examples of Cluster-API in use, you can look at VMware Project Pacific and Tanzu Mission Control. Both launching soon.