K8s Stateful – Storage Basics

In this post, I was going to cover two controller kinds (Deployment and StatefulSet), the resource kinds PersistentVolume, PersistentVolumeClaim, and StorageClass., and go over examples of non-persistent and persistent service delivery. Then I realized it was going to be way too long. So I’ll cover the basics of stateless/stateful and storage in this post. Next post, I’ll cover Deployment and StatefulSet controllers and provide some examples of use.

You’ve likely heard of the ‘twelve-Factor App’ methodology . It was published in 2011 by Adam Wiggins of Heroku. Adam said the twelve-factors should apply to: “Any developer building applications which run as a service and/or Ops engineers who deploy or manage such applications”.  Number six of the twelve-factor methodology originally stated: “Execute the app as one or more stateless processes”. Important to read as, “applications which run as a service”, does not equal “any, and all applications”. Since its publication in 2011, the sixth factor has been subtly reworded by some to reflect that.

When we say stateless, it needs to be put in context. There are many attributes of state, and a stateless application/pod in K8s terms is a bit of a misnomer. With regards to K8s, we consider a stateless application to be one that is immutable. Any changes made to it during run time will be lost when restarted, due to it not persisting storage. Think of it as roughly being a server with a read-only drive.

Since the advent of K8s, as with Borg, a fundamental goal was to create a distributed system for processes, that are highly portable and not bound by the compute infrastructure. So in the beginning, the first incarnation of a process/task (now known as a pod) was implemented with no persistent storage. That’s not to say Google wasn’t implementing persistent storage constructs with Borg, they were. At the time, Borg utilized centralized and distributed processes to map persistent data to workloads.The sum of the moving parts implemented for Borg was too great and complex to move forward into K8s. K8s took a new approach to solve for data persistence when required. That method and its uses have evolved, and continue to.

So, K8s storage is a key concept in understanding a stateless or stateful app in this context. I don’t want this to become a complete explanation of storage concepts in K8s, but I’ll delve into some basic resource concepts so we can better understand the differences between Deployment and StatefulSet,

At a high-level, K8s leverages its extensible architecture to allow for many types of storage providers. How we can consume storage varies based on the underlying storage type, and provider we use. At the time of this writing, K8s is moving from in-tree to out-of-tree (CSI) volume plugins. Not necessary to understand here, but know that one way or another, we start with a provider/plugin that bridges physical storage into K8s Our pods consume that disk resource via the volumes definition in the pod spec. The following examples are specific to working with in-tree volume providers.

The pod.spec.volumes field defines a name and a reference to a Volume or PVC, for each entry. We then use volumeMounts in the container definition of our pod spec (pod.spec,containers.volumeMounts) to mount and define the path of the mount within the container. volumeMounts is correlated to the volumes entry via the volumes entry name field.

(Note: I’ll go through Volume and PVC separately, for now, it’s pointing to a bit of provisioned and bound disk space. I upper-cased Volume because it’s referring to a host based volume that K8s consumes, rather than a field in a pod spec.).

In this example, you see the volumes definition for AWS EBS (awsElasticBlockStore:) with a volumeID to link to a pre-provisioned  Volume. It has a name of ‘test-volume’. The volumeMounts definition in containers links to it by this label (test-volume) and mounts it at /test-ebs in the container file system hierarchy. If we were at the command prompt inside the container and executed
ls / , we would see the /test-ebs directory.

apiVersion: v1
kind: Pod
metadata:
  name: test-ebs
spec:
  containers:
  - image: k8s.gcr.io/test-webserver
    name: test-container
    volumeMounts:
    - mountPath: /test-ebs
      name: test-volume
  volumes:
  - name: test-volume
    # This AWS EBS volume must already exist.
    awsElasticBlockStore:
      volumeID: <volume-id>
      fsType: ext4

A spec.volumes lifecycle matches that of  pod. If a container dies, the volumes persists and will be reattached once kubelet restarts it. If a pod in a replicatset is killed, it is not seen as being dead, just missing. So as the replicaset controller restarts it, the volumes will persist and be reattached. It isn’t until the cluster is told that a pod, and all its replicas should no longer exist that the volumes will also come to an end.

Take note of the comment in the pod spec example above. It says ‘This AWS EBS volume must already exist.’. There are two ways a Volume can be created, manually or dynamically.

Manually can be by either creating a Volume specific to the storage directly on the host, and then pointing to it with its storage specific attributes (As is done above). In the case above, the comment is saying the administrator needs to create the AWS EBS volume and retrieve the volumeID and fsType to supply to the pod spec.

In the case of the AWS EBS provider, the command would be something like this:

aws ec2 create-volume --availability-zone=eu-west-1a --size=10 --volume-type=gp2

The other option would be to manually create a PersistentVolume (aka PV). A PV is an abstraction that allows us to refer to pre-provisioned storage with a common method. This enhances portability of pod specs and eases the user experience. As in the EBS example above, if the administrator created the EBS Volume, they would need to provide the volumeID and other Volume specific info to the user for inclusion in the pod spec. If the administrator created the storage as a PV, the user could create a PersistentVolumeClaim  (aka PVC) and K8s would automatically find a suitable PV and bind it.The user need not know any of the details of the Volume or underlying storage. They simply create the  PVC, and then refer to it in the pod spec.

Here is an example of a PV for EBS storage. Notice that instead of running a CLI command outside of K8s purview, we’re defining a resource with the storage specific details. As K8s is aware of this as a PV, it will see the PVC and match it to this (or another suitable PV) automatically.

apiVersion: v1
kind: PersistentVolume
metadata:
  name: pv0001
spec:
  capacity:
    storage: 10Gi
  accessModes:
    - ReadWriteOnce
  awsElasticBlockStore: 
    fsType: "ext4" 
    volumeID: "vol-f16a04ba"

So that’s better, create a bunch of PVs ahead of time, then let users gobble them up at will. No need for never-ending creation of Volumes and shuffling of Volume specific info to the users, each time a volume is needed for a pod.

Better, but still not ideal, for a number of reasons. The relationship for a PVC-to-PV is one-to-one. If I created ten PVs, each 10Gi in size, and ten PVCs were created that each required 2Gi, we would be wasting 80Gi of available storage.Another issue would come with PVC storage class requirements. If we don’t know what kind of storage will be needed, we’d need to triple or quadruple the ten PVs so that we have ten of each class. The problems conintue, but I’m guessing you get the gist of it.

This is where dynamic provisioning comes to the rescue. With dynamic provisioning, we don’t pre-provision Volume or PVs. We create definitions of storage providers along with class specific settings, and store them as resource definitions in the cluster. These resources are called StorageClasses.

Notice that we are driving the definition of our underlying storage configuration higher and higher in the abstraction plane. We’ve gone from Volume to PersistentVolume to StorageClass. A storage class specific to AWS EBS looks like this.

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: standard
provisioner: kubernetes.io/aws-ebs
parameters:
  type: gp2
  iopsPerGB: "10"
  fsType: ext4
reclaimPolicy: Retain
allowVolumeExpansion: true
mountOptions:
  - debug
volumeBindingMode: Immediate

We use a PVC to reference a StorageClass and K8s automatically provisions the PV based on it and then binds the PVC to it. It is provisioned based on the attributes we specify in the PVC. So if I need one PV with 2GiB and another with 5GiB, the PVs will be provisioned to match those requirements exactly. This abstraction and functionality solves many challenges with the previous methods covered,

Here we see an example of a PVC that makes use of the StorageClass above to dynamically provision a PV and bind to :

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: myclaim
spec:
  accessModes:
    - ReadWriteOnce
  volumeMode: Filesystem
  resources:
    requests:
      storage: 8Gi
  storageClassName: standard
  selector:
    matchLabels:
      release: "stable"

Once the PVC is bound to a PV, it can be consumed as a volume via a pod spec and mounted in a container’s file system directory tree. The container and volume are separate objects. If a containers dies and is restarted, the volume is reattached. Thus, the data persists.

So with a PVC created, this is what our pod.spec will look like now:

apiVersion: v1
kind: Pod
metadata:
  name: test-ebs
spec:
  containers:
  - image: k8s.gcr.io/test-webserver
    name: test-container
    volumeMounts:
    - mountPath: /test-ebs
      name: test-volume
  volumes:
  - name: test-volume     
    persistentVolumeClaim:
      claimName: myclaim

The final detail is that we can define a ‘Default’ StorageClass that will be automatically applied any time we don’t explicitly reference one in our PVC. That’s it for storage at this level. This is enough to understand persistent storage in the context of a basic stateful application’s use.

To summarize the storage concepts:

  • Storage is made available to K8s by storage plugins (or providers)
  • Volumes represent storage available to pods
  • Persistent Volumes is an abstraction that enables the use of Persistent Volume Claims. It abstracts the details of the underlying storage parameters from the requester.
  • Persistent Volume Claim is the method for requesting a Volume via the Persistent Volume abstraction.
  • A Storage Class further abstracts PV by defining underlying storage provisioning parameters and allowing a PVC to reference it. In this case, a PV is created on-demand.
  • A pod.spec defines volumes and then defines how those volumes are mounted within a container.

Next up, I’ll cover the Deployment and StatefulSet controllers, and provide examples of how each operates differently given stateless and stateful requirements.

K8s Service Kind – ‘All the Other Things’

In the previous post, I covered the concept of a K8s Service and the clusterIP  service. ClusterIP is  a method to create a stable IP address with a  DNS A record for the service, load balance requests to endpoint pod replicas (endpoints), and is only exposed within the cluster. In this post, I’ll cover the remaining service kinds: of headless, nodePort, and LoadBalancer. I’ll also cover non-service methods of exposing a pod via Ingress Controller and hostNetwork,\hostPort.

The Headless service, like ClusterIP,  is an in-cluster only addressing component. I’ll go into this in more detail when I cover the use of StatefulSet in a future post. A headless service is created by defining clusterIP field as ‘none’ in a Service spec. Continue reading

Connecting K8s Pods – Services – ClusterIP

Photo by Tj Holowaychuk on UnsplashIn the previous two posts, I went over the basic networking components of K8s, and how a packet flows within the cluster. I discussed the node and pod CIDRs, but purposely avoided getting deeper into the ClusterIP/Service CIDR ( –service-cluster-ip-range). Reason being, we need to cover the concept of endpoints and services first.

While pods in a cluster can all communicate with each other directly via their unique pod IP addresses, it isn’t an  ideal practice to initiate communication directly to a pod IP. As we know,  a pod is ephemeral and potentially short lived. There is no guarantee the IP address will remain the same across recreation. In fact, K8s will always attempt change it.We also wouldn’t achieve the benefits of load balancing across pod replicas if we initiated communication by specific pod IP. Continue reading

Kubernetes Networking – Nodes and Pods – Sample Packet Walk

In my previous post, I covered the basics of Kubernetes networking. I thought it would be useful to follow up with a visual walk through of a packet’s path from pod to a remote node. In this example, I’m using K8s deployed with one master, three worker nodes, and nsx-t networking at both the node and pods level.

You can click the images in this post to open a larger view in a new tab. As you can see, I have three worker nodes with IP addrs 172.15.0.3 – 172.15.0.5. The node IP addressing for this cluster is configured using the 172.15.0.0/24 CIDR.

Next, let’s look at one of the pods I have running. This is the kube-dns pod. As you can see, it’s running in the kube-system namespace, on worker node 172.15.0.3 (As seen in the list of worker nodes above), has an IP addr assigned of 172.16.1.2, and hosts a container called kubedns. Continue reading

Kubernetes Networking – Nodes and Pods

I’ve been procrastinating with preparing for my CNCF Certified Kubernetes Administrator certification. Figure it was time to get to it and thought a series of blog posts on the various topics I’m digging into would be of interest to others.

Beginning with K8s networking, I’ll go into the details of the various layers of networking, how they come together, and how k8s leverages them to provide us with functioning container services.

This is a big topic, so I’ve decided to take a multi-post approach with it. I’ll start with basic networking of nodes and pods. I’ll cover network policies in another post. And then I’ll cover services, deployments, service discovery, etc. in a final post. Continue reading

VMworld K8s Announcements

If you work with VMware tech at all, it’s unlikely you’ve not heard the buzz related to the K8s announcements by now. And I’ve been waiting a long time to be able to openly discuss them. Gladly, the day has finally arrived.

Tanzu (pronounced tahn-zu) is a brand name announcement. Not too exciting but conveys a direction to bring the various cloud native product lines together. Moving forward, we’ll see all products related to cloud native branded as Tanzu ‘something’. Continue reading

Automating Kubernetes Operations with Enterprise PKS

In building toward a k8s predictive auto-scale capability, I’ve built some simple constructs with virtual machines running on vSphere with kubeadm installed k8s. If you’ve followed the series, you know I reached the anticipated point of manually installed k8s becoming too inefficient to operate and had decided to implement VMware PKS to address that pain point.

Over the past year, many blogs have posted instructions on how to install and configure PKS. I suggest following the VMware and/or Pivotal online documentation. PKS is rapidly developed and released, and it doesn’t take long for a blog post on installation to become out of date. So I won’t post installation instructions here. Continue reading

Predictive Auto-scaling of vSphere VMs and Container Services – Shifting Gears

In the previous posts, I detailed the two main functions for performing the auto-scaling procedure. One to scale the K8s VM backed cluster across additional physical hosts, and one to scale the K8s pod deployment across the added hosts/nodes. The predictive trigger of these functions was to be the focus of this post.

As time passed and work took me away from this project, I am now without my former lab setup to test it. I could rebuild the lab and continue  the final bit of code that predicts a pattern in past CPU demand and calls the functions. But for my sanity’s sake, I’m going to pass on that for now and move on to the next logical progression in this series. Continue reading

Predictive Auto-scaling of vSphere VMs and Container Services – The Plumbing (2 of 2)

In the previous post, I wrote the function to scale a K8s deployment with a REST API call. In this post, I’ll write the other function required to codify the scaling of a K8s cluster across physical resources.

The function here will power on a K8s node VM that resides on an ESXi host and make itself available to the K8s cluster for additional compute. I will use the vSphere REST API and then combine it with the previous function to complete the scale out operation. Continue reading

Predictive Auto-scaling of vSphere VMs and Container Services – The Plumbing (1 of 2)

Photo by chuttersnap on UnsplashTo be pragmatic with this exercise, I’ll begin with a focus on the plumbing required to scale both components in parallel.. I’ll follow up in another post with predictive automation based on some primitive analytics.

As discussed in the first post of this series, I intend to rely on APIs exposed by VMware and the Kubernetes/Docker distributions to enable predictive orchestration of scaling. For my proof of concept, I will focus on scale-out and scale-in use cases. Scale-up and down will be excluded. Continue reading