Kubernetes集群模拟删除k8s重装详解

Linux系统
373
0
0
2023-06-13
标签   Kubernetes
目录
  • 一.系统环境
  • 二.前言
  • 三.重装Kubernetes集群
  • 3.1 环境介绍
  • 3.2 删除k8s所有节点(node)
  • 3.3 kubeadm初始化
  • 3.4 添加worker节点到k8s集群
  • 3.5 安装calico

一.系统环境

服务器版本

docker软件版本

CPU架构

CentOS Linux release 7.4.1708 (Core)

Docker version 20.10.12

x86_64

二.前言

当我们安装部署好一套Kubernetes集群,使用一段时间之后可能会有重新安装Kubernetes集群的需求,本文为了满足这个需求,模拟重装Kubernetes集群。

重新安装Kubernetes集群的前提是已经有一套可以正常运行的Kubernetes集群,关于Kubernetes(k8s)集群的安装部署,可以查看博客《Centos7 安装部署Kubernetes(k8s)集群》

三.重装Kubernetes集群

3.1 环境介绍

Kubernetes集群架构:k8scloude1作为master节点,k8scloude2,k8scloude3作为worker节点

服务器

操作系统版本

CPU架构

进程

功能描述

k8scloude1/192.168.110.130

CentOS Linux release 7.4.1708 (Core)

x86_64

docker,kube-apiserver,etcd,kube-scheduler,kube-controller-manager,kubelet,kube-proxy,coredns,calico

k8s master节点

k8scloude2/192.168.110.129

CentOS Linux release 7.4.1708 (Core)

x86_64

docker,kubelet,kube-proxy,calico

k8s worker节点

k8scloude3/192.168.110.128

CentOS Linux release 7.4.1708 (Core)

x86_64

docker,kubelet,kube-proxy,calico

k8s worker节点

3.2 删除k8s所有节点(node)

kubectl drain 安全驱逐节点上面所有的 pod,--ignore-daemonsets往往需要指定的,这是因为deamonset会忽略SchedulingDisabled标签(使用kubectl drain时会自动给节点打上不可调度SchedulingDisabled标签),因此deamonset控制器控制的pod被删除后,可能马上又在此节点上启动起来,这样就会成为死循环.因此这里忽略daemonset.

[root@kscloude1 ~]# kubectl drain k8scloude3 --ignore-daemonsets 
node/kscloude3 cordoned
WARNING: ignoring DaemonSet-managed Pods: kube-system/calico-node-wmzr, kube-system/kube-proxy-84gcx
evicting pod kube-system/calico-kube-controllers-b9fbfff44-rl2mh
pod/calico-kube-controllers-b9fbfff44-rl2mh evicted
node/kscloude3 evicted

k8scloude3变为SchedulingDisabled

[root@kscloude1 ~]# kubectl get nodes 
NAME         STATUS                     ROLES                  AGE   VERSION
kscloude1   Ready                      control-plane,master   64m   v1.21.0
kscloude2   Ready                      <none>                 56m   v1.21.0
kscloude3   Ready,SchedulingDisabled   <none>                 56m   v1.21.0

删除节点k8scloude3

[root@kscloude1 ~]# kubectl delete nodes k8scloude3
node "kscloude3" deleted
[root@kscloude1 ~]# kubectl get nodes 
NAME         STATUS   ROLES                  AGE   VERSION
kscloude1   Ready    control-plane,master   65m   v1.21.0
kscloude2   Ready    <none>                 57m   v1.21.0

其余节点进行类似操作

[root@kscloude1 ~]# kubectl drain k8scloude2 --ignore-daemonsets 
node/kscloude2 cordoned
WARNING: ignoring DaemonSet-managed Pods: kube-system/calico-node-bbst, kube-system/kube-proxy-8wf8t
evicting pod kube-system/coredns-d6fc579-kgmfl
evicting pod kube-system/calico-kube-controllers-b9fbfff44-nq79f
evicting pod kube-system/coredns-d6fc579-dln6p
pod/coredns-d6fc579-dln6p evicted
pod/coredns-d6fc579-kgmfl evicted
pod/calico-kube-controllers-b9fbfff44-nq79f evicted
node/kscloude2 evicted
[root@kscloude1 ~]# kubectl drain k8scloude1 --ignore-daemonsets 
node/kscloude1 cordoned
WARNING: ignoring DaemonSet-managed Pods: kube-system/calico-node-rvx, kube-system/kube-proxy-zblkg
evicting pod kube-system/coredns-d6fc579-tgcl4
evicting pod kube-system/calico-kube-controllers-b9fbfff44-t9k45
evicting pod kube-system/coredns-d6fc579-l9g7b
pod/calico-kube-controllers-b9fbfff44-t9k45 evicted
pod/coredns-d6fc579-tgcl4 evicted
pod/coredns-d6fc579-l9g7b evicted
node/kscloude1 evicted
[root@kscloude1 ~]# kubectl get nodes 
NAME         STATUS                     ROLES                  AGE   VERSION
kscloude1   Ready,SchedulingDisabled   control-plane,master   66m   v1.21.0
kscloude2   Ready,SchedulingDisabled   <none>                 58m   v1.21.0
[root@kscloude1 ~]# kubectl delete nodes k8scloude2
node "kscloude2" deleted
[root@kscloude1 ~]# kubectl delete nodes k8scloude1
node "kscloude1" deleted

此时,k8s集群所有节点都被删除了

[root@kscloude1 ~]# kubectl get nodes 
No resources found

3.3 kubeadm初始化

此时重新进行kubeadm初始化,但是报错,看报错信息可以发现:端口被占用,配置文件已经存在

[root@kscloude1 ~]# kubeadm init --image-repository registry.aliyuncs.com/google_containers --kubernetes-version=v1.21.0 --pod-network-cidr=10.244.0.0/16
[init] Using Kubernetes version: v.21.0
[preflight] Running pre-flight checks
        [WARNING Firewalld]: firewalld is active, please ensure ports [ 10250] are open or your cluster may not function correctly
        [WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
error execution phase preflight: [preflight] Some fatal errors occurred:
        [ERROR Port-]: Port 6443 is in use
        [ERROR Port-]: Port 10259 is in use
        [ERROR Port-]: Port 10257 is in use
        [ERROR FileAvailable--etc-kubernetes-manifests-kube-apiserver.yaml]: /etc/kubernetes/manifests/kube-apiserver.yaml already exists
        [ERROR FileAvailable--etc-kubernetes-manifests-kube-controller-manager.yaml]: /etc/kubernetes/manifests/kube-controller-manager.yaml already exists
        [ERROR FileAvailable--etc-kubernetes-manifests-kube-scheduler.yaml]: /etc/kubernetes/manifests/kube-scheduler.yaml already exists
        [ERROR FileAvailable--etc-kubernetes-manifests-etcd.yaml]: /etc/kubernetes/manifests/etcd.yaml already exists
        [ERROR Port-]: Port 10250 is in use
        [ERROR Port-]: Port 2379 is in use
        [ERROR Port-]: Port 2380 is in use
        [ERROR DirAvailable--var-lib-etcd]: /var/lib/etcd is not empty
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
To see the stack trace of this error execute with --v= or higher

当我们重新初始化k8s集群的时候,需要清空原先的设置

[root@kscloude1 ~]# kubeadm reset
[reset] Reading configuration from the cluster...
[reset] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
W 16:17:15.936292   53177 reset.go:99] [reset] Unable to fetch the kubeadm-config ConfigMap from cluster: failed to get node registration: failed to get corresponding node: nodes "k8scloude1" not found
[reset] WARNING: Changes made to this host by 'kubeadm init' or 'kubeadm join' will be reverted.
[reset] Are you sure you want to proceed? [y/N]: y
[preflight] Running pre-flight checks
W 16:17:17.651795   53177 removeetcdmember.go:79] [reset] No kubeadm config, using etcd pod spec to get data directory
[reset] Stopping the kubelet service
[reset] Unmounting mounted directories in "/var/lib/kubelet"
[reset] Deleting contents of config directories: [/etc/kubernetes/manifests /etc/kubernetes/pki]
[reset] Deleting files: [/etc/kubernetes/admin.conf /etc/kubernetes/kubelet.conf /etc/kubernetes/bootstrap-kubelet.conf /etc/kubernetes/controller-manager.conf /etc/kubernetes/scheduler.conf]
[reset] Deleting contents of stateful directories: [/var/lib/etcd /var/lib/kubelet /var/lib/dockershim /var/run/kubernetes /var/lib/cni]
The reset process does not clean CNI configuration. To do so, you must remove /etc/cni/net.d
The reset process does not reset or clean up iptables rules or IPVS tables.
If you wish to reset iptables, you must do so manually by using the "iptables" command.
If your cluster was setup to utilize IPVS, run ipvsadm --clear (or similar)
to reset your system's IPVS tables.
The reset process does not clean your kubeconfig files and you must remove them manually.
Please, check the contents of the $HOME/.kube/config file.

重新进行kubeadm初始化

[root@kscloude1 ~]# kubeadm init --image-repository registry.aliyuncs.com/google_containers --kubernetes-version=v1.21.0 --pod-network-cidr=10.244.0.0/16
[init] Using Kubernetes version: v.21.0
[preflight] Running pre-flight checks
        [WARNING Firewalld]: firewalld is active, please ensure ports [ 10250] are open or your cluster may not function correctly
        [WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [kscloude1 kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 192.168.110.130]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [kscloude1 localhost] and IPs [192.168.110.130 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [kscloude1 localhost] and IPs [192.168.110.130 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Starting the kubelet
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up tom0s
[kubelet-check] Initial timeout ofs passed.
[apiclient] All control plane components are healthy after.004984 seconds
[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config-.21" in namespace kube-system with the configuration for the kubelets in the cluster
[upload-certs] Skipping phase. Please see --upload-certs
[mark-control-plane] Marking the node kscloude1 as control-plane by adding the labels: [node-role.kubernetes.io/master(deprecated) node-role.kubernetes.io/control-plane node.kubernetes.io/exclude-from-external-load-balancers]
[mark-control-plane] Marking the node kscloude1 as control-plane by adding the taints [node-role.kubernetes.io/master:NoSchedule]
[bootstrap-token] Using token:wtx2.gfb3j9obk0fz663z
[bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to get nodes
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstrap-token] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstrap-token] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[bootstrap-token] Creating the "cluster-info" ConfigMap in the "kube-public" namespace
[kubelet-finalize] Updating "/etc/kubernetes/kubelet.conf" to point to a rotatable kubelet client certificate and key
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy
Your Kubernetes control-plane has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config
Alternatively, if you are the root user, you can run:
  export KUBECONFIG=/etc/kubernetes/admin.conf
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/
Then you can join any number of worker nodes by running the following on each as root:
kubeadm join.168.110.130:6443 --token 45wtx2.gfb3j9obk0fz663z \
        --discovery-token-ca-cert-hash sha:d390e28ef900f9a17483bb2d230b9e5be76920d128eb020d472c21d594aa278d 

按照要求创建目录和配置文件

[root@kscloude1 ~]# mkdir -p $HOME/.kube
[root@kscloude1 ~]# sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
cp:是否覆盖"/root/.kube/config"? y
[root@kscloude1 ~]# sudo chown $(id -u):$(id -g) $HOME/.kube/config

3.4 添加worker节点到k8s集群

接下来把另外的两个worker节点也加入到k8s集群。

把k8scloude2节点加入k8s集群

#另外两个节点执行加入集群的命令
[root@kscloude2 ~]# kubeadm join 192.168.110.130:6443 --token 45wtx2.gfb3j9obk0fz663z --discovery-token-ca-cert-hash sha256:d390e28ef900f9a17483bb2d230b9e5be76920d128eb020d472c21d594aa278d
[preflight] Running pre-flight checks
        [WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
error execution phase preflight: [preflight] Some fatal errors occurred:
        [ERROR FileAvailable--etc-kubernetes-kubelet.conf]: /etc/kubernetes/kubelet.conf already exists
        [ERROR Port-]: Port 10250 is in use
        [ERROR FileAvailable--etc-kubernetes-pki-ca.crt]: /etc/kubernetes/pki/ca.crt already exists
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
To see the stack trace of this error execute with --v= or higher

work节点重新加入k8s集群也需要清空原先的设置

[root@kscloude2 ~]# kubeadm reset
[reset] WARNING: Changes made to this host by 'kubeadm init' or 'kubeadm join' will be reverted.
[reset] Are you sure you want to proceed? [y/N]: y
[preflight] Running pre-flight checks
W 16:22:12.705575   59352 removeetcdmember.go:79] [reset] No kubeadm config, using etcd pod spec to get data directory
[reset] No etcd config found. Assuming external etcd
[reset] Please, manually reset etcd to prevent further issues
[reset] Stopping the kubelet service
[reset] Unmounting mounted directories in "/var/lib/kubelet"
[reset] Deleting contents of config directories: [/etc/kubernetes/manifests /etc/kubernetes/pki]
[reset] Deleting files: [/etc/kubernetes/admin.conf /etc/kubernetes/kubelet.conf /etc/kubernetes/bootstrap-kubelet.conf /etc/kubernetes/controller-manager.conf /etc/kubernetes/scheduler.conf]
[reset] Deleting contents of stateful directories: [/var/lib/kubelet /var/lib/dockershim /var/run/kubernetes /var/lib/cni]
The reset process does not clean CNI configuration. To do so, you must remove /etc/cni/net.d
The reset process does not reset or clean up iptables rules or IPVS tables.
If you wish to reset iptables, you must do so manually by using the "iptables" command.
If your cluster was setup to utilize IPVS, run ipvsadm --clear (or similar)
to reset your system's IPVS tables.
The reset process does not clean your kubeconfig files and you must remove them manually.
Please, check the contents of the $HOME/.kube/config file.

再次把k8scloude2节点加入k8s集群,可以看到k8scloude2节点加入k8s集群成功

[root@kscloude2 ~]# kubeadm join 192.168.110.130:6443 --token 45wtx2.gfb3j9obk0fz663z --discovery-token-ca-cert-hash sha256:d390e28ef900f9a17483bb2d230b9e5be76920d128eb020d472c21d594aa278d
[preflight] Running pre-flight checks
        [WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...
This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.
Run 'kubectl get nodes' on the control-plane to see this node join the cluster.

k8scloude3节点也进行类似操作

[root@kscloude3 ~]# kubeadm reset
[root@kscloude3 ~]# kubeadm join 192.168.110.130:6443 
--tokenwtx2.gfb3j9obk0fz663z 
--discovery-token-ca-cert-hash sha:d390e28ef900f9a17483bb2d230b9e5be76920d128eb020d472c21d594aa278d

查看k8s集群节点状态

#此时所有节点都显示Ready状态
[root@kscloude1 ~]# kubectl get nodes 
NAME         STATUS   ROLES                  AGE   VERSION
kscloude1   Ready    control-plane,master   5m    v1.21.0
kscloude2   Ready    <none>                 63s   v1.21.0
kscloude3   Ready    <none>                 33s   v1.21.0

3.5 安装calico

因为我们之前已经安装过一次k8s集群了,并且calico插件也安装好了,重装之后calico是没有装的,但是kubectl get nodes的状态都为Ready状态,是因为Ready这个状态已经写入了etcd数据库里了,状态没更新,所以需要重装一次calico

[root@kscloude1 ~]# kubectl apply -f calico.yaml 
configmap/calico-config created
customresourcedefinition.apiextensions.ks.io/bgpconfigurations.crd.projectcalico.org created
customresourcedefinition.apiextensions.ks.io/bgppeers.crd.projectcalico.org created
customresourcedefinition.apiextensions.ks.io/blockaffinities.crd.projectcalico.org created
customresourcedefinition.apiextensions.ks.io/caliconodestatuses.crd.projectcalico.org created
customresourcedefinition.apiextensions.ks.io/clusterinformations.crd.projectcalico.org created
customresourcedefinition.apiextensions.ks.io/felixconfigurations.crd.projectcalico.org created
customresourcedefinition.apiextensions.ks.io/globalnetworkpolicies.crd.projectcalico.org created
customresourcedefinition.apiextensions.ks.io/globalnetworksets.crd.projectcalico.org created
customresourcedefinition.apiextensions.ks.io/hostendpoints.crd.projectcalico.org created
customresourcedefinition.apiextensions.ks.io/ipamblocks.crd.projectcalico.org created
customresourcedefinition.apiextensions.ks.io/ipamconfigs.crd.projectcalico.org created
customresourcedefinition.apiextensions.ks.io/ipamhandles.crd.projectcalico.org created
customresourcedefinition.apiextensions.ks.io/ippools.crd.projectcalico.org created
customresourcedefinition.apiextensions.ks.io/ipreservations.crd.projectcalico.org created
customresourcedefinition.apiextensions.ks.io/kubecontrollersconfigurations.crd.projectcalico.org created
customresourcedefinition.apiextensions.ks.io/networkpolicies.crd.projectcalico.org created
customresourcedefinition.apiextensions.ks.io/networksets.crd.projectcalico.org created
clusterrole.rbac.authorization.ks.io/calico-kube-controllers created
clusterrolebinding.rbac.authorization.ks.io/calico-kube-controllers created
clusterrole.rbac.authorization.ks.io/calico-node created
clusterrolebinding.rbac.authorization.ks.io/calico-node created
daemonset.apps/calico-node created
serviceaccount/calico-node created
deployment.apps/calico-kube-controllers created
serviceaccount/calico-kube-controllers created
Warning: policy/vbeta1 PodDisruptionBudget is deprecated in v1.21+, unavailable in v1.25+; use policy/v1 PodDisruptionBudget
poddisruptionbudget.policy/calico-kube-controllers created

现在集群才是完全正常的

[root@kscloude1 ~]# kubectl get nodes 
NAME         STATUS   ROLES                  AGE     VERSION
kscloude1   Ready    control-plane,master   9m11s   v1.21.0
kscloude2   Ready    <none>                 5m14s   v1.21.0
kscloude3   Ready    <none>                 4m44s   v1.21.0

注意:如果k8s master节点没有执行kubeadm reset重置命令,只是重置了worker节点,则不需要重新安装calico

[root@kscloude1 ~]# kubectl get pods -n kube-system -o wide
NAME                                       READY   STATUS    RESTARTS   AGE     IP                NODE         NOMINATED NODE   READINESS GATES
calico-kube-controllers-b9fbfff44-4jzkj   1/1     Running   0          3m16s   10.244.251.193    k8scloude3   <none>           <none>
calico-node-bdlgm/1     Running   0          3m16s   192.168.110.130   k8scloude1   <none>           <none>
calico-node-hxbk                          1/1     Running   0          3m16s   192.168.110.128   k8scloude3   <none>           <none>
calico-node-nsbfs/1     Running   0          3m16s   192.168.110.129   k8scloude2   <none>           <none>
coredns-d6fc579-7wm95                   1/1     Running   0          11m     10.244.158.65     k8scloude1   <none>           <none>
coredns-d6fc579-87q8j                   1/1     Running   0          11m     10.244.158.66     k8scloude1   <none>           <none>
etcd-kscloude1                            1/1     Running   0          12m     192.168.110.130   k8scloude1   <none>           <none>
kube-apiserver-kscloude1                  1/1     Running   0          12m     192.168.110.130   k8scloude1   <none>           <none>
kube-controller-manager-kscloude1         1/1     Running   0          12m     192.168.110.130   k8scloude1   <none>           <none>
kube-proxy-xh                           1/1     Running   0          7m48s   192.168.110.128   k8scloude3   <none>           <none>
kube-proxy-lpjz                           1/1     Running   0          8m18s   192.168.110.129   k8scloude2   <none>           <none>
kube-proxy-zxlk                           1/1     Running   0          11m     192.168.110.130   k8scloude1   <none>           <none>
kube-scheduler-kscloude1                  1/1     Running   0          12m     192.168.110.130   k8scloude1   <none>           <none>

自此,k8s集群重装完成!