K8s – Stacked etcd to External – Zero Downtime

Because sometimes you start off with stacked etcd nodes, and then decide you really wanted external.

First and foremost, my blog disclaimer is in full effect, especially on this one. Use this info at your own risk! In this post, I’ll cover the steps to convert a Kubeadm deployed stacked etcd cluster into one consuming external etcd nodes, with no downtime.

While this guide is based on Kubeadm clusters, the process can be applied to any cluster if you have access to the etcd config and certs/keys. Note: This will split the etcd service away from the Kubeadm upgrade process. You will need to ensure etcd upgrades and version compatibility manually. You will also need to update your kubeadm-config ConfigMap to tell Kubeadm that the etcd is external.

To lay some groundwork, let’s review the implementation details of a stacked etcd cluster deployed by Kubeadm. On each control plane node, Kubeadm will create a static pod manifest in the /etc/kubernetes/manifests directory, named etcd.yaml.

This pod has two hostPath volumeMounts, /var/lib/etcd and /etc/kubernetes/pki/etcd. The /var/lib/etcd directory is  where the etcd service will store the db and supporting files.

The /etc/kubernetes/pki/etcd directory is where Kubeadm places a ca.key and ca.crt (A separate CA from the one used for the kube-apiserver is used for etcd self-signed certs). Kubeadm then generates the key pairs and certificates with this CA cert, per node. The server key and cert provide transport layer security. The client and peer certificates/keys enable node to node and user to node authentication. We configure etcd to trust/authenticate any certificate that has been issued by the trusted CA.

The etcd CA cert must be the same on each node. So with a common etcd CA signing cert/key, each node receives a generated server, client, and peer certificate  that all other etcd nodes trust.

Ok, that’s the TLS side of how stacked etcd is setup by Kubeadm. With this understanding, we can use the etcd ca cert and key to configure external nodes to also be authenticated as members of the etcd cluster.

The second detail to look into is how the stacked cluster is configured to cluster. Each time we add a control plane node with Kubeadm, it will read the etcd cluster member list, and generate an –initial-cluster config parameter for the new etcd instance. the –initial-… options are ignored by etcd after first init. If we look in the static pod etcd manifest for the first control plane in our cluster, we’ll see –initial-cluster equals just that first node. When we add another cp node, the –initial-cluster will have both the first node and the new node listed, and so on.

I considered two options for this exercise. Either snapshot and restore with hard cutover or add external etcd nodes to existing cluster and then phase out the stacked node. I decided that the second option would be the least disruptive (But it also carries the most risk, so I will take a snapshot just before beginning the task).

I’ll create three external etcd nodes, configure them with certificates signed by the kubeadm generated etcd CA cert and to join the existing etcd cluster. They will sync with the existing nodes. Then I’ll configure the kube-apiserver static pod manifests to point to the external etcd nodes.

(Another interesting Kubeadm stacked implementation detail is that each control plane node only communicates with its colocated etcd node, via localhost address. The etcd client used by Kubernetes implements client-side load balancing, so we can provide multiple etcd node addresses to it.. I’m still up in the air on whether or not a managed LB in front of etcd nodes would be better.)

Once kube-apiserver is pointed at the external nodes only, I’ll remove the stacked members from the etcd cluster, and remove the static pod manifests for etcd from the control plane hosts. The result is zero downtime reconfiguration of the Kubernetes cluster etcd infrastructure.

Link to repo with directions and more details.

 

Install etcd Cluster with TLS

This topic has been written up multiple times, so not exactly cutting edge info here. But I’ve found so many of the tutorials to be dated, and/or lacking in specific detail, and/or using tools I don’t have interest in (e.g. cfssl). So, I decided to post this no-nonsense, just works guide. This post will cover installing an etcd cluster, secured with TLS. In my previous post, I covered some basics on creating self-signed certs with openssl. The one additional openssl detail in this post will be the openssl config file to configure generated CSRs.

Specifically, I’ll use it to add key usage, extended key usage, and subject alternate names requests to the CSR. Extensions are things that have been added to the PKCS standard over the years. Subject alternate name (SAN) enables us to use other names than the CN. A common use of this is to setup a wildcard DNS alternate name (If you inspect my blog site’s SSL cert, you’ll see this). That type of thing is ok for certs that aren’t signed for commerce purposes.

In the case of this etcd config, I’ll use SAN to provide IP addresses as alternate names. I’ll add the IP address for each node on a single cert. This way, I can use the same cert for the entire setup.

And as always, I’ve provided all of the directions in a git repo, located here.

In my next post, I’ll cover setting up a load balancer to front end HA K8s control plane and connecting a K8s cluster to the external etcd cluster. If I feel really ambitious, I’ll cover migrating an existing stacked control plane to external etcd.

Openssl Self-Signed Certs – 2023

I set out this morning to write a post on configuring external ETCD for Kubernetes, with openssl self-signed certs (you know, the kind you use for your home lab). I got sidetracked on openssl and all of its ever-changing/deprecated options. So, this will be a preamble to that original intent.

Other than key sizes and algorithms, only a few command options have changed in openssl v3. I won’t dive into key size and algorithms (too much to cover there). In this post, I will cover the why and how of creating self-signed certificates with openssl, along with up to date commands for v3.  I think the first time I grappled with understanding SSL (now TLS) was 1997. Although SSL has now become TLS, not a lot has changed with the underlying basic dynamics of public key encryption. Continue reading

Kubeadm Upgrade

A few posts back, I revisited setting up a Kubernetes cluster from scratch with Kubeadm. In this post, I’m going to cover upgrading a Kubeadm deployed cluster.

The Kubernetes version I deployed previously is v1.26.1 and now 1.26.2 is available. So lets go through the process of upgrading to 1.26.2. The first thing you should do (not covered here) is to backup your Kubernetes cluster. In the event you encounter an issue that Kubeadm cannot recover from, the backup is your next recourse. At a minimum, make a backup of the etcd database.

To find the latest release of K8s, we can look at https://github.com/kubernetes/kubernetes/releases. Here we see a 1.27.0 alpha pre-release is posted, and the latest is 1.26.2. (Steps also available in this repo) Continue reading

Not My Normal Topic – Better Audio Part 2

How do you get a recording properly leveled (normalized) for YouTube? Glad you asked. In part two of my take-a-break-from-K8s posting, I’ll answer that question. I mentioned a limiter in my previous post. A limiter is somewhat self-described. It places a limit on the maximum audio level. If you hit the limiter threshold, it will squash the audio to keep it under the threshold level. This comes in handy in the worse-case scenario of going above full scale zero and clipping. Think of a limiter as your last resort safe guard for not clipping your audio (I’ve seen professional audio engineers use limiters to raise gain and normalize, this is beyond my current understanding. I know what sounds good to me and using a limiter to exceed zero always sounds bad to me). Continue reading

Not My Normal Topic – Better Audio Part 1

Stepping aside from my normal topics to post about a question I was recently asked. The topic was audio equipment and process for YouTube videos. Audio is another science. It presents challenges that catch my interest in much the same way IT does.

Listen closely to the next show you watch. Pay attention to the environment the recorded voices are in. You’ll see that excellent vocal recording is captured and produced in many different settings. Then try to get a decent recording from your desk. There is a lot of science and skill in this detail. Continue reading

Blast from the Past – Installing Kubernetes with Kubeadm – 2023

TL;,DR: Steps to install CRI-O and Kubeadm provisioned cluster on Ubuntu 22.04.

Hard for me to believe, it has nearly  been five years since my first (and only) post on Kubeadm. That was circa Kubernetes version 1.09, and I was sort of posting what I had learned about Kubeadm, as I went through it.

I recently came up on my latest expiring home lab vSphere license. This annual event always causes me to reconsider how I use my server and whether handing out more cash is the best option. This year, I’ve decided to let go of the automation niceness, and see how I do without. You may have read some of my past posts where I tried Proxmox/KVM for a bit. I never really got the network and storage performing well enough, but may give it another try at some point. Continue reading

Crossplane Troubleshooting

This post is a published draft. I will be adding to it as I can. If you have items you think should be added, I’ve enabled the comments section for it. Happy to add your troubleshooting scenarios.

This post will cover basic troubleshooting of Crossplane. I have other posts that describe the components of Crossplane, so I won’t rehash all of that here. This is certainly not all encompassing. It is simply a list of common issues and how to resolve them.

Additional tips can be found here: link to Crossplane.io troubleshooting tips.

Everything that happens before the Managed Resource is created is controlled by the Crossplane core. If you are experiencing issues with Claims, XRs, or Compositions being created, then troubleshoot at the Crossplane core. Continue reading

Crossplane ProviderConfig and Argo CD

From two recent slack threads, I was reminded I’ve set some things between Argo CD and Crossplane, that I haven’t posted here.

This setting is required whether you use Argo CD or not. But it came up on a question about why the Argo CD custom health check wasn’t working. When creating a ProviderConfig for provider-kubernetes or provider-helm in a Composition, you must add base.readinessChecks[0].type: None (And because wordpress destroys all of my yaml formatting, see example of it here. at lines 291-292):

The other recent issue that came up was how Argo CD complains that a Crossplane ProviderConfig CRD doesn’t exist before the Provider is fully initialized. The way around that is with a simple annotation.

For that annotation, see line #7 of this ProviderConfig:

Hopefully this single post’s title shows up in searches for both issues.