Automating Kubernetes Operations with PKS

In building toward a k8s predictive auto-scale capability, I’ve built some simple constructs with virtual machines running on vSphere with kubeadm installed k8s. If you’ve followed the series, you know I reached the anticipated point of manually installed k8s becoming too inefficient to operate and had decided to implement VMware PKS to address that pain point.

Over the past year, many blogs have posted instructions on how to install and configure PKS. I suggest following the VMware and/or Pivotal online documentation. PKS is rapidly developed and released, and it doesn’t take long for a blog post on installation to become out of date. So I won’t post installation instructions here.

I will take a moment to describe the architecture and components as they are consistent across releases and necessary to understand before any installation.

The first four components you need to understand are Opsman, BOSH Director, PKS API , and stemcells.

Opsman (Pivotal Operations Manager, not to be confused with vRealize Operations Manager) is the central management interface for install/configure/update operations of the other PKS components. We begin by installing an Opsman vm from an OVF file. We then use a web browser to access the administration interface.

Opsman uses the term ‘Tile’ to refer to product packages. You import tiles that allow you to configure the product and then deploy it.

BOSH Director is deployed from Opsman as a tile. BOSH is responsible for connecting to the underlying IAAS, deploying VMs to it, and monitoring the health of those VMs and the k8s nodes running on them. It does more than this, but this level of understanding is enough for now.

PKS API is another tile we deploy from Opsman. It is associated by configuration points with the BOSH tile and defines the PKS API VM that will be deployed. The PKS API VM is the entry point to create Kubernetes clusters from the command line. As with BOSH, there is much more that the PKS API VM is responsible for but this is enough for now.

Finally, there are stemcells. Stemcells are simply VM templates that are IAAS specific. So we use one stemcell for deployments to vSphere and another for deployments to AWS. A stemcell defines both the operating system and the k8s release on the VM. When we update stemcells we are updating one or both of those components in our deployment.

So, we install Opsman, import BOSH and PKS tile, configure them for our underlying IAAS and resources, apply the correct stemcell(s), and then we’re ready to deploy and manage kubernetes.

I haven’t mentioned networking. PKS can use NSX-T or Flannel for the k8s clusters and NSX-T or vDS networking for the vSphere IAAS. PKS and K8s benefit significantly from NSX-T integration but is beyond the scope of this post.

In this post, I’ve provided a recording of the process to upgrade a PKS deployment. It’s a great example to illustrate the operational benefits PKS brings to k8s. I believe the days of ‘We’re going to run k8s on bare-metal’ are over, but I am occasionally surprised to hear it every now and again.

Considering that we moved to virtualization to make managing workloads and resources more efficient, running k8s on bare-metal would just be a step backwards in time. K8s nodes are operating systems, if you install them on bare-metal, you lose all of the benefits you have today with virtualization.

So, in the video, we see that our k8s clusters are updated, which includes the operating system and the k8s release. It is fully automated and there is no loss of k8s cluster service. With just the small three node cluster I manually installed with kubeadm on VMs, I would have to create a second environment, load another cluster, test the updated operating system with the updated k8s release, verify the settings were all correct, and then apply to each VM. Imagine that process with a production scale of k8s clusters. No thanks.

I’ll stop here for now. The next step in this series will be to get back to work on a predictive function and then investigate the API interfaces available to scale a PKS deployed k8s cluster. With PKS, I’m set with all of the underlying k8s components.

Predictive Auto-scaling of vSphere VMs and Docker Container Services – Shifting Gears

In the previous posts, I detailed the two main functions for performing the auto-scaling procedure. One to scale the K8s VM backed cluster across additional physical hosts, and one to scale the K8s pod deployment across the added hosts/nodes. The predictive trigger of these functions was to be the focus of this post.

As time passed and work took me away from this project, I am now without my former lab setup to test it. I could rebuild the lab and continue  the final bit of code that predicts a pattern in past CPU demand and calls the functions. But for my sanity’s sake, I’m going to pass on that for now and move on to the next logical progression in this series.

If I were to complete the last part of this, I would likely keep it in the vein of open source, unsupported distributions. TICK and Prophet are the two pieces I had earmarked for the work.

Unless you are Uber, Google, Facebook, etc., you are better off implementing a vendor supported solution wherever possible. The integration points have largely been worked out for you and you are left with far less to manage on your own.

Ultimately, the outcome was on a trajectory for my anticipated result. No matter how much work you put into a hacked together container service with multiple open source projects, there would be no end in sight to achieve something that was production-ready.

So that will be the focus of my next pass at the sample functionality I’ve created with the unsupported open source software. I will aim to implement a standard upstream version of k8s with an auto-scaling capability via VMware PKS. And then finally, look at incorporating a predictive function into that implementation.

Along the way, I will refer back to the previous methods to compare the pros and cons of either approach.


On a side note….

Another thing I’ve learned along the way is that time series forecasting is an immense topic, with many rabbit holes. I find it quite interesting but have determined that to truly, deeply understand the topic, it would require far more time than I have to commit. I’ll leave the details to the data scientist and be happy with knowing how to pick an appropriate set of time series analysis models for a given task.

The more I’ve read into time series ‘forecasting’, the more I’ve come to believe it should be referred to as timer series ‘prediction’. A time series forecast is really just a best guess of an input’s future value based on historic value(s) of it. While the science has proven to be accurate to a degree, it’s sort of akin to predicting the weather tomorrow based on the past six months of weather, without taking into consideration any of the weather indicators that affect it. The accuracy of a time series forecast result is largely dependent on the properties with influence over the metric you are analyzing.

I believe a truly intelligent predictive auto-scaling function will need to rely on more than a set of time series forecasting models that are only looking at traditional metrics like CPU. It will require knowledge of real-world events and trends. For example, tracking the path of a hurricane as an influence on when, where, and how to scale a service. That could be an interesting project for Kafka. I’ll hold off on committing to that for now as well.

Predictive Auto-scaling of vSphere VMs and Docker Container Services – The Plumbing (2 of 2)

In the previous post, I wrote the function to scale a K8s deployment with a REST API call. In this post, I’ll write the other function required to codify the scaling of a K8s cluster across physical resources.

The function here will power on a K8s node VM that resides on an ESXi host and make itself available to the K8s cluster for additional compute. I will use the vSphere REST API and then combine it with the previous function to complete the scale out operation. Continue reading

Predictive Auto-scaling of vSphere VMs and Docker Container Services – The Plumbing (1 of 2)

Photo by chuttersnap on UnsplashTo be pragmatic with this exercise, I’ll begin with a focus on the plumbing required to scale both components in parallel.. I’ll follow up in another post with predictive automation based on some primitive analytics.

As discussed in the first post of this series, I intend to rely on APIs exposed by VMware and the Kubernetes/Docker distributions to enable predictive orchestration of scaling. For my proof of concept, I will focus on scale-out and scale-in use cases. Scale-up and down will be excluded. Continue reading

Installing a VM Backed K8s 1.10 Cluster with Kubeadm

This post is the second, in a series that will consider the topic of predictive auto-scaling of vSphere VMs and containerized services. The purpose of this post is to describe how to build the first components with the installation and configuration of Docker and Kubernetes on VMs. I will be using Docker 1.13, Kubernetes 1.10, and CentOS 7 x64 for the implementation.

You will need either a vSphere environment or VMware Workstation/Fusion, CentOS install media, and an internet connection. There are a handful of ways to implement a test-bed Kubernetes cluster. Running on a cloud provider CaaS platform, orchestrating as a Docker-in-Docker deployment, building manually from scratch, or building with a helper tool from scratch. Continue reading

Predictive Auto-scaling of vSPhere VMs and Docker Container Services

Photo by Samuel Zeller on Unsplash

Reactive and predictive auto-scaling have existed for some time already. Predictive auto-scaling has been custom developed and leveraged by large service providers for almost a decade.

To define a few terms: Scaling is the process of adjusting resources to satisfactorily serve demand. Auto-scaling is replacing manual process with automated IT processes that react to current-state metrics. Predictive auto-scaling relies on observed metrics, over time, to predict an upcoming change in demand and auto-scale ahead of actual demand. This is advantageous because the resources required to scale will not contend with the services demanding more. Continue reading

vRealize Operations 6.6 – Stress Calculation and Rightsizing

Professional football coach, Paul Brown once said, “The key to winning is poise under stress”. This is also the key for optimal performance of your vSphere data center resources. We want our resources stressed just enough to take full advantage of our capacity investment, without delivering substandard performance.

One of the vROps questions I am often asked is:, “How does vROps come up with this ‘right-size’ recommendation?”. It’s often followed with :, “Why should I trust this?”.

The answer is found with an understanding of two factors. The first being how vROps is set to analyze CPU and memory (This is configured by policy or the Monitoring Goals wizard). The default is CPU Demand | Memory Consumed.

There are three options, ranging from what is considered conservative to aggressive. ‘Allocation’, ‘Consumed’, and ‘Demand’ are terms with specific meanings for vROps capacity analysis in this regard. Within this post, I will use the term ‘demand’ in a general sense and will not delve into the differences between them.

The second is the stress policy and calculation. I’ll be focusing on this factor as it performs the heavy lifting here.

Stress is an evaluation of how much something has, in relation to how much is demanded of it. It can be derived for anything that provides resources; such as, virtual machines, hosts, and clusters. Continue reading

Containers, VMs, and VMware – VMware Pivotal Container Service

This has been in my drafts folder for three months now. I figure it’s time to get back to it.

Previous posts have delved into the benefits of leveraging virtualization to provide automation, elasticity, governance, and ‘day 2’ management to a container-centric DevOps architecture. For VMware, VIC has become the platform for repackaged applications. There isn’t much focus on orchestration with VIC. There is some, but it’s proprietary to vSphere. That’s not to say it’s inferior to a more widely adopted K8s model, it’s just purpose oriented. Continue reading

Containers, VMs, and VMware – Containers in Virtual Machines – The Best of Both Worlds

Containers in Virtual Machines – The Best of Both Worlds

My plan was to follow the previous post on Harbor with a rundown of vSphere Integrated Containers. But as I began to write, I realized just the topic of the relationship between containers and VMs was fairly lengthy. I’ve decided to cover just that in this post. In the next post, I will cover VIC. Continue reading