Recovering from a failed Kubernetes upgrade

The Kubernetes upgrade process changed in 1.8, meaning that the steps that used to work can now leave you with a broken installation. The steps below are the ones that worked for me to recover from a failed 1.7 to 1.8 Kubernetes upgrade.

Some days ago, I was upgrading Kubernetes from 1.7 to 1.8 and, as the optimist that I am, left the packages upgrading while I read the release notes, only to realize that doing so before upgrading the control plane would leave me with a broken installation.

Happily, however, as most things in life, it’s fixable. These are the steps that worked for me:

  1. First, disable swap. As of Kubernetes 1.8 if kubelet detects swap it will refuse to start.
  2. Remove or comment the swap line in /etc/fstab.
  3. If you are using systemd and GPT, in addition to commenting the swap entry in ‘/etc/fstab`, you will also have to mask it in systemd, otherwise it will take the liberty of automounting it for you.
  4. Once that is done, run rm /var/lib/dockershim/sandbox/* on all nodes and the kubelets will start.
  5. Go to the master and run kubectl get nodes. If nodes appear as Ready, you are lucky, go to 6 and you are done; else skip to 7.
  6. run kubeadm upgrade apply v1.8.1 --force and thank both the old and the new gods.
  7. If not, you will have to remove the NotReady nodes from the cluster, run docker system prune, upgrade the master, and join the nodes after.
  8. When you are done, just to be sure, run kubeadm upgrade apply v1.8.1 --force. It is idempotent so it’s safe to run it twice.

Good luck!