Predictive Auto-scaling of vSphere VMs and Container Services – The Plumbing (1 of 2)

Photo by chuttersnap on UnsplashTo be pragmatic with this exercise, I’ll begin with a focus on the plumbing required to scale both components in parallel.. I’ll follow up in another post with predictive automation based on some primitive analytics.

As discussed in the first post of this series, I intend to rely on APIs exposed by VMware and the Kubernetes/Docker distributions to enable predictive orchestration of scaling. For my proof of concept, I will focus on scale-out and scale-in use cases. Scale-up and down will be excluded.

Because I am primarily interested in the act of predictive auto-scaling, I won’t delve into the heavier tasks of auto-scaling the entire IaaS layer or of stateful container services. We would need to go far deeper into the schedulers, resource associations, network, and storage components to address the entire IaaS layer and stateful containers.There are other existing works that can be layered on, or in place of, this plumbing for the predictive functionality I am working toward here.

We can approach auto-scaling of the IaaS from a few angles. One would be to create VMs on demand to instantiate additional nodes for the container service cluster. Another would be to have node VMs already provisioned, but powered off. And then there are any number of other approaches in between. 

I believe having IaaS resources fully pre-provisioned with container node VMs staged is the best approach. We can automate the creation of a container node VM for the addition of new physical hosts use case, but provisioning a new container node VM at the time of container service need will only add latency to the process. Having an existing node VM in powered off state will not substantially impact the IaaS cluster performance when not in use. To further reduce latency, we could leave the node VMs powered on and adjust vSphere shares. But then we would need to implement only custom schedulers for our K8s service. Power-on/off will be the best option for testing. 

So with that in mind, I need a fairly simple set of functions for the slightly more complex predictive scaling function to perform its actions. One that can power control VMs and another that can increase/decrease pods across K8s nodes. I will leverage the RESTful APIs from vSphere 6.5 and K8s 1.10 to create the following functions:

  1. Power on/off node VM – vSpVmPwr()
  2. Scale deployment out/in – K8sDplScale()

In a complete implementation, I would obviously need many more functions. They would enable the solution to leverage VM tags to identify VMs as K8s cluster nodes, find them in the inventory, determine state, evaluate pod status, etc.. For this, I will take some shortcuts, hard coding these variables as much as possible and using REST calls for both functions from within a python script.

K8s has functionality built-in to horizontally scale (i.e. scale out) deployments based on CPU utilization or custom metrics. There are also extensions through Kapacitor that aim to ease the path to custom metrics. I could rely on one of these to trigger scaling of node VMs as needed. But my goal is to be predictive, so I won’t utilize the K8s HPA auto-scaling functionality as a trigger here. However, I would be relying on the historical record of K8s scaling (Whether manual or auto) and would thus need a connection to cAdvisor and/or Heapster at some point. More on this in the coming posts.

The code will be implemented in python, as follows:

Scaling K8s Deployments:

Kubectl commands are actually executed as REST API calls under the hood. A handy flag I found to gain introspect on the calls being made is –v=8. When you include the –v=8 flag after kubectl, you will receive a stdout of the API calls being made for that command.

This helps a great deal when working out the REST GET and PUT strings we need to provide to accomplish a task. It also helps with working out the json file format we need to provide with our PUT calls.

For the scale out/in REST call, I manually created a Kubernetes deployment named ‘web-test2’ with a two pod replica set. I then used ‘kubectl scale’ with the –v=8 flag to scale it to four replicas, which returned a number of API calls. The last of which being the pertinent one.

PUT https://192.168.1.201:6443/apis/extensions/v1beta1/namespaces/default/deployments/web-test2/scale

This tells me the REST API call required to scale a deployment. Next, I need the json data that needs to be PUT with it. For that, I curl a GET request to the API and receive the data in json as it exists now:

curl -H 'Accept: application/json' http://localhost:8080/apis/extensions/v1beta1/namespaces/default/deployments/web-test2/scale > scale-deployment.json

On the side: If you haven’t reached your TLDR limit yet, you’ll notice that the kubectl –v=8 command returned http://192.168.1.201:6443, while my curl command was directed at http://localhost:8080. This is due to API server authentication requirements and the RBAC that is now implemented with K8s. Password auth can be configured but is disabled by default when kubeadm is used to install a cluster. Kubectl utilizes the config file we copied to our ~/.kube directory in the previous post for key-pair authentication credentials. We can do the same with our non-kube commands (e.g. curl), but we need to provide additional information. To bypass that, we can initiate a kube proxy on localhost and use that address without needing to jump through the extra hoops (The proxy leverages the key-pair in the k8s config file). In production, you will obviously spend a bit more time setting up key credentials for each service.

The above curl command created the scale-deployment.json file in the image below:

With some educated guessing, followed by testing, we can deduce that changing specs:replicas value from 2 to 4 and removing the resourceVersion & creationTimestamp lines results in the json required for a scale PUT operation. The functional json PUT structure looks like this:

The PUT call with the json formatted data can be tested with curl command below. The result will scale the deployment to four pods.

curl -X PUT -d@scale-deployment.json -H 'Content-Type: application/json' http://localhost:8080/apis/extensions/v1beta1/namespaces/default/deployments/web-test2/scale

From the testing above, I can now create the python script with spec:replicas set by a passed variable, as in the following image. This will be the template for my K8sDplScale() deployment scaling function in the finished solution.

That’s it for the K8s scale function. We just need the vSphere REST API call to power control a VM and the functions we need to plumb our predictive auto-scaling are complete. I’ll cover that in my next post and bring both together with a demonstration of the scaling. From there, I’ll wrap up with (what will also likely turn into multiple posts) the predictive analytics layer.