目录
- 一.系统环境
- 二.前言
- 三.重装Kubernetes集群
- 3.1 环境介绍
- 3.2 删除k8s所有节点(node)
- 3.3 kubeadm初始化
- 3.4 添加worker节点到k8s集群
- 3.5 安装calico
一.系统环境
服务器版本 | docker软件版本 | CPU架构 |
CentOS Linux release 7.4.1708 (Core) | Docker version 20.10.12 | x86_64 |
二.前言
当我们安装部署好一套Kubernetes集群,使用一段时间之后可能会有重新安装Kubernetes集群的需求,本文为了满足这个需求,模拟重装Kubernetes集群。
重新安装Kubernetes集群的前提是已经有一套可以正常运行的Kubernetes集群,关于Kubernetes(k8s)集群的安装部署,可以查看博客《Centos7 安装部署Kubernetes(k8s)集群》
三.重装Kubernetes集群
3.1 环境介绍
Kubernetes集群架构:k8scloude1作为master节点,k8scloude2,k8scloude3作为worker节点
服务器 | 操作系统版本 | CPU架构 | 进程 | 功能描述 |
k8scloude1/192.168.110.130 | CentOS Linux release 7.4.1708 (Core) | x86_64 | docker,kube-apiserver,etcd,kube-scheduler,kube-controller-manager,kubelet,kube-proxy,coredns,calico | k8s master节点 |
k8scloude2/192.168.110.129 | CentOS Linux release 7.4.1708 (Core) | x86_64 | docker,kubelet,kube-proxy,calico | k8s worker节点 |
k8scloude3/192.168.110.128 | CentOS Linux release 7.4.1708 (Core) | x86_64 | docker,kubelet,kube-proxy,calico | k8s worker节点 |
3.2 删除k8s所有节点(node)
kubectl drain 安全驱逐节点上面所有的 pod,--ignore-daemonsets往往需要指定的,这是因为deamonset会忽略SchedulingDisabled标签(使用kubectl drain时会自动给节点打上不可调度SchedulingDisabled标签),因此deamonset控制器控制的pod被删除后,可能马上又在此节点上启动起来,这样就会成为死循环.因此这里忽略daemonset.
[root@kscloude1 ~]# kubectl drain k8scloude3 --ignore-daemonsets | |
node/kscloude3 cordoned | |
WARNING: ignoring DaemonSet-managed Pods: kube-system/calico-node-wmzr, kube-system/kube-proxy-84gcx | |
evicting pod kube-system/calico-kube-controllers-b9fbfff44-rl2mh | |
pod/calico-kube-controllers-b9fbfff44-rl2mh evicted | |
node/kscloude3 evicted |
k8scloude3变为SchedulingDisabled
[root@kscloude1 ~]# kubectl get nodes | |
NAME STATUS ROLES AGE VERSION | |
kscloude1 Ready control-plane,master 64m v1.21.0 | |
kscloude2 Ready <none> 56m v1.21.0 | |
kscloude3 Ready,SchedulingDisabled <none> 56m v1.21.0 |
删除节点k8scloude3
[root@kscloude1 ~]# kubectl delete nodes k8scloude3 | |
node "kscloude3" deleted | |
[root@kscloude1 ~]# kubectl get nodes | |
NAME STATUS ROLES AGE VERSION | |
kscloude1 Ready control-plane,master 65m v1.21.0 | |
kscloude2 Ready <none> 57m v1.21.0 |
其余节点进行类似操作
[root@kscloude1 ~]# kubectl drain k8scloude2 --ignore-daemonsets | |
node/kscloude2 cordoned | |
WARNING: ignoring DaemonSet-managed Pods: kube-system/calico-node-bbst, kube-system/kube-proxy-8wf8t | |
evicting pod kube-system/coredns-d6fc579-kgmfl | |
evicting pod kube-system/calico-kube-controllers-b9fbfff44-nq79f | |
evicting pod kube-system/coredns-d6fc579-dln6p | |
pod/coredns-d6fc579-dln6p evicted | |
pod/coredns-d6fc579-kgmfl evicted | |
pod/calico-kube-controllers-b9fbfff44-nq79f evicted | |
node/kscloude2 evicted | |
[root@kscloude1 ~]# kubectl drain k8scloude1 --ignore-daemonsets | |
node/kscloude1 cordoned | |
WARNING: ignoring DaemonSet-managed Pods: kube-system/calico-node-rvx, kube-system/kube-proxy-zblkg | |
evicting pod kube-system/coredns-d6fc579-tgcl4 | |
evicting pod kube-system/calico-kube-controllers-b9fbfff44-t9k45 | |
evicting pod kube-system/coredns-d6fc579-l9g7b | |
pod/calico-kube-controllers-b9fbfff44-t9k45 evicted | |
pod/coredns-d6fc579-tgcl4 evicted | |
pod/coredns-d6fc579-l9g7b evicted | |
node/kscloude1 evicted | |
[root@kscloude1 ~]# kubectl get nodes | |
NAME STATUS ROLES AGE VERSION | |
kscloude1 Ready,SchedulingDisabled control-plane,master 66m v1.21.0 | |
kscloude2 Ready,SchedulingDisabled <none> 58m v1.21.0 | |
[root@kscloude1 ~]# kubectl delete nodes k8scloude2 | |
node "kscloude2" deleted | |
[root@kscloude1 ~]# kubectl delete nodes k8scloude1 | |
node "kscloude1" deleted |
此时,k8s集群所有节点都被删除了
[root@kscloude1 ~]# kubectl get nodes | |
No resources found |
3.3 kubeadm初始化
此时重新进行kubeadm初始化,但是报错,看报错信息可以发现:端口被占用,配置文件已经存在
[ | ]|
[.21.0 | ] Using Kubernetes version: v|
[ | ] Running pre-flight checks|
[is active, please ensure ports [ 10250] are open or your cluster may not function correctly | ]: firewalld|
["cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/ | ]: detected|
error execution phase preflight: [preflight] Some fatal errors occurred: | |
[6443 is in use | ]: Port|
[10259 is in use | ]: Port|
[10257 is in use | ]: Port|
[ | ]: /etc/kubernetes/manifests/kube-apiserver.yaml already exists|
[ | ]: /etc/kubernetes/manifests/kube-controller-manager.yaml already exists|
[ | ]: /etc/kubernetes/manifests/kube-scheduler.yaml already exists|
[ | ]: /etc/kubernetes/manifests/etcd.yaml already exists|
[10250 is in use | ]: Port|
[2379 is in use | ]: Port|
[2380 is in use | ]: Port|
[var/lib/etcd is not empty | ]: /|
[with `--ignore-preflight-errors=...` | ] If you know what you are doing, you can make a check non-fatal|
To see the stack trace of this error execute with --v= or higher |
当我们重新初始化k8s集群的时候,需要清空原先的设置
[root@kscloude1 ~]# kubeadm reset | |
[reset] Reading configuration from the cluster... | |
[reset] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml' | |
W 16:17:15.936292 53177 reset.go:99] [reset] Unable to fetch the kubeadm-config ConfigMap from cluster: failed to get node registration: failed to get corresponding node: nodes "k8scloude1" not found | |
[reset] WARNING: Changes made to this host by 'kubeadm init' or 'kubeadm join' will be reverted. | |
[reset] Are you sure you want to proceed? [y/N]: y | |
[preflight] Running pre-flight checks | |
W 16:17:17.651795 53177 removeetcdmember.go:79] [reset] No kubeadm config, using etcd pod spec to get data directory | |
[reset] Stopping the kubelet service | |
[reset] Unmounting mounted directories in "/var/lib/kubelet" | |
[reset] Deleting contents of config directories: [/etc/kubernetes/manifests /etc/kubernetes/pki] | |
[reset] Deleting files: [/etc/kubernetes/admin.conf /etc/kubernetes/kubelet.conf /etc/kubernetes/bootstrap-kubelet.conf /etc/kubernetes/controller-manager.conf /etc/kubernetes/scheduler.conf] | |
[reset] Deleting contents of stateful directories: [/var/lib/etcd /var/lib/kubelet /var/lib/dockershim /var/run/kubernetes /var/lib/cni] | |
The reset process does not clean CNI configuration. To do so, you must remove /etc/cni/net.d | |
The reset process does not reset or clean up iptables rules or IPVS tables. | |
If you wish to reset iptables, you must do so manually by using the "iptables" command. | |
If your cluster was setup to utilize IPVS, run ipvsadm --clear (or similar) | |
to reset your system's IPVS tables. | |
The reset process does not clean your kubeconfig files and you must remove them manually. | |
Please, check the contents of the $HOME/.kube/config file. |
重新进行kubeadm初始化
[root@kscloude1 ~]# kubeadm init --image-repository registry.aliyuncs.com/google_containers --kubernetes-version=v1.21.0 --pod-network-cidr=10.244.0.0/16 | |
[init] Using Kubernetes version: v.21.0 | |
[preflight] Running pre-flight checks | |
[WARNING Firewalld]: firewalld is active, please ensure ports [ 10250] are open or your cluster may not function correctly | |
[WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/ | |
[preflight] Pulling images required for setting up a Kubernetes cluster | |
[preflight] This might take a minute or two, depending on the speed of your internet connection | |
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull' | |
[certs] Using certificateDir folder "/etc/kubernetes/pki" | |
[certs] Generating "ca" certificate and key | |
[certs] Generating "apiserver" certificate and key | |
[certs] apiserver serving cert is signed for DNS names [kscloude1 kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 192.168.110.130] | |
[certs] Generating "apiserver-kubelet-client" certificate and key | |
[certs] Generating "front-proxy-ca" certificate and key | |
[certs] Generating "front-proxy-client" certificate and key | |
[certs] Generating "etcd/ca" certificate and key | |
[certs] Generating "etcd/server" certificate and key | |
[certs] etcd/server serving cert is signed for DNS names [kscloude1 localhost] and IPs [192.168.110.130 127.0.0.1 ::1] | |
[certs] Generating "etcd/peer" certificate and key | |
[certs] etcd/peer serving cert is signed for DNS names [kscloude1 localhost] and IPs [192.168.110.130 127.0.0.1 ::1] | |
[certs] Generating "etcd/healthcheck-client" certificate and key | |
[certs] Generating "apiserver-etcd-client" certificate and key | |
[certs] Generating "sa" key and public key | |
[kubeconfig] Using kubeconfig folder "/etc/kubernetes" | |
[kubeconfig] Writing "admin.conf" kubeconfig file | |
[kubeconfig] Writing "kubelet.conf" kubeconfig file | |
[kubeconfig] Writing "controller-manager.conf" kubeconfig file | |
[kubeconfig] Writing "scheduler.conf" kubeconfig file | |
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env" | |
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml" | |
[kubelet-start] Starting the kubelet | |
[control-plane] Using manifest folder "/etc/kubernetes/manifests" | |
[control-plane] Creating static Pod manifest for "kube-apiserver" | |
[control-plane] Creating static Pod manifest for "kube-controller-manager" | |
[control-plane] Creating static Pod manifest for "kube-scheduler" | |
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests" | |
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up tom0s | |
[kubelet-check] Initial timeout ofs passed. | |
[apiclient] All control plane components are healthy after.004984 seconds | |
[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace | |
[kubelet] Creating a ConfigMap "kubelet-config-.21" in namespace kube-system with the configuration for the kubelets in the cluster | |
[upload-certs] Skipping phase. Please see --upload-certs | |
[mark-control-plane] Marking the node kscloude1 as control-plane by adding the labels: [node-role.kubernetes.io/master(deprecated) node-role.kubernetes.io/control-plane node.kubernetes.io/exclude-from-external-load-balancers] | |
[mark-control-plane] Marking the node kscloude1 as control-plane by adding the taints [node-role.kubernetes.io/master:NoSchedule] | |
[bootstrap-token] Using token:wtx2.gfb3j9obk0fz663z | |
[bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles | |
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to get nodes | |
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials | |
[bootstrap-token] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token | |
[bootstrap-token] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster | |
[bootstrap-token] Creating the "cluster-info" ConfigMap in the "kube-public" namespace | |
[kubelet-finalize] Updating "/etc/kubernetes/kubelet.conf" to point to a rotatable kubelet client certificate and key | |
[addons] Applied essential addon: CoreDNS | |
[addons] Applied essential addon: kube-proxy | |
Your Kubernetes control-plane has initialized successfully! | |
To start using your cluster, you need to run the following as a regular user: | |
mkdir -p $HOME/.kube | |
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config | |
sudo chown $(id -u):$(id -g) $HOME/.kube/config | |
Alternatively, if you are the root user, you can run: | |
export KUBECONFIG=/etc/kubernetes/admin.conf | |
You should now deploy a pod network to the cluster. | |
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at: | |
https://kubernetes.io/docs/concepts/cluster-administration/addons/ | |
Then you can join any number of worker nodes by running the following on each as root: | |
kubeadm join.168.110.130:6443 --token 45wtx2.gfb3j9obk0fz663z \ | |
--discovery-token-ca-cert-hash sha:d390e28ef900f9a17483bb2d230b9e5be76920d128eb020d472c21d594aa278d |
按照要求创建目录和配置文件
[ | ]|
[ | ]|
cp:是否覆盖"/root/.kube/config"? y | |
[ | ]
3.4 添加worker节点到k8s集群
接下来把另外的两个worker节点也加入到k8s集群。
把k8scloude2节点加入k8s集群
[ | ]|
[ | ] Running pre-flight checks|
["cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/ | ]: detected|
error execution phase preflight: [preflight] Some fatal errors occurred: | |
[ | ]: /etc/kubernetes/kubelet.conf already exists|
[10250 is in use | ]: Port|
[ | ]: /etc/kubernetes/pki/ca.crt already exists|
[with `--ignore-preflight-errors=...` | ] If you know what you are doing, you can make a check non-fatal|
To see the stack trace of this error execute with --v= or higher |
work节点重新加入k8s集群也需要清空原先的设置
[root@kscloude2 ~]# kubeadm reset | |
[reset] WARNING: Changes made to this host by 'kubeadm init' or 'kubeadm join' will be reverted. | |
[reset] Are you sure you want to proceed? [y/N]: y | |
[preflight] Running pre-flight checks | |
W 16:22:12.705575 59352 removeetcdmember.go:79] [reset] No kubeadm config, using etcd pod spec to get data directory | |
[reset] No etcd config found. Assuming external etcd | |
[reset] Please, manually reset etcd to prevent further issues | |
[reset] Stopping the kubelet service | |
[reset] Unmounting mounted directories in "/var/lib/kubelet" | |
[reset] Deleting contents of config directories: [/etc/kubernetes/manifests /etc/kubernetes/pki] | |
[reset] Deleting files: [/etc/kubernetes/admin.conf /etc/kubernetes/kubelet.conf /etc/kubernetes/bootstrap-kubelet.conf /etc/kubernetes/controller-manager.conf /etc/kubernetes/scheduler.conf] | |
[reset] Deleting contents of stateful directories: [/var/lib/kubelet /var/lib/dockershim /var/run/kubernetes /var/lib/cni] | |
The reset process does not clean CNI configuration. To do so, you must remove /etc/cni/net.d | |
The reset process does not reset or clean up iptables rules or IPVS tables. | |
If you wish to reset iptables, you must do so manually by using the "iptables" command. | |
If your cluster was setup to utilize IPVS, run ipvsadm --clear (or similar) | |
to reset your system's IPVS tables. | |
The reset process does not clean your kubeconfig files and you must remove them manually. | |
Please, check the contents of the $HOME/.kube/config file. |
再次把k8scloude2节点加入k8s集群,可以看到k8scloude2节点加入k8s集群成功
[ | ]|
[ | ] Running pre-flight checks|
["cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/ | ]: detected|
[from the cluster... | ] Reading configuration|
[this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml' | ] FYI: You can look at|
["/var/lib/kubelet/config.yaml" | ] Writing kubelet configuration to file|
[with flags to file "/var/lib/kubelet/kubeadm-flags.env" | ] Writing kubelet environment file|
[ | ] Starting the kubelet|
[for the kubelet to perform the TLS Bootstrap... | ] Waiting|
This node has joined the cluster: | |
* Certificate signing request was sent to apiserver and a response was received. | |
* The Kubelet was informed of the new secure connection details. | |
Run 'kubectl get nodes' on the control-plane to see this node join the cluster. |
k8scloude3节点也进行类似操作
[root@kscloude3 ~]# kubeadm reset | |
[root@kscloude3 ~]# kubeadm join 192.168.110.130:6443 | |
--tokenwtx2.gfb3j9obk0fz663z | |
--discovery-token-ca-cert-hash sha:d390e28ef900f9a17483bb2d230b9e5be76920d128eb020d472c21d594aa278d |
查看k8s集群节点状态
#此时所有节点都显示Ready状态 | |
[root@kscloude1 ~]# kubectl get nodes | |
NAME STATUS ROLES AGE VERSION | |
kscloude1 Ready control-plane,master 5m v1.21.0 | |
kscloude2 Ready <none> 63s v1.21.0 | |
kscloude3 Ready <none> 33s v1.21.0 |
3.5 安装calico
因为我们之前已经安装过一次k8s集群了,并且calico插件也安装好了,重装之后calico是没有装的,但是kubectl get nodes的状态都为Ready状态,是因为Ready这个状态已经写入了etcd数据库里了,状态没更新,所以需要重装一次calico
[root@kscloude1 ~]# kubectl apply -f calico.yaml | |
configmap/calico-config created | |
customresourcedefinition.apiextensions.ks.io/bgpconfigurations.crd.projectcalico.org created | |
customresourcedefinition.apiextensions.ks.io/bgppeers.crd.projectcalico.org created | |
customresourcedefinition.apiextensions.ks.io/blockaffinities.crd.projectcalico.org created | |
customresourcedefinition.apiextensions.ks.io/caliconodestatuses.crd.projectcalico.org created | |
customresourcedefinition.apiextensions.ks.io/clusterinformations.crd.projectcalico.org created | |
customresourcedefinition.apiextensions.ks.io/felixconfigurations.crd.projectcalico.org created | |
customresourcedefinition.apiextensions.ks.io/globalnetworkpolicies.crd.projectcalico.org created | |
customresourcedefinition.apiextensions.ks.io/globalnetworksets.crd.projectcalico.org created | |
customresourcedefinition.apiextensions.ks.io/hostendpoints.crd.projectcalico.org created | |
customresourcedefinition.apiextensions.ks.io/ipamblocks.crd.projectcalico.org created | |
customresourcedefinition.apiextensions.ks.io/ipamconfigs.crd.projectcalico.org created | |
customresourcedefinition.apiextensions.ks.io/ipamhandles.crd.projectcalico.org created | |
customresourcedefinition.apiextensions.ks.io/ippools.crd.projectcalico.org created | |
customresourcedefinition.apiextensions.ks.io/ipreservations.crd.projectcalico.org created | |
customresourcedefinition.apiextensions.ks.io/kubecontrollersconfigurations.crd.projectcalico.org created | |
customresourcedefinition.apiextensions.ks.io/networkpolicies.crd.projectcalico.org created | |
customresourcedefinition.apiextensions.ks.io/networksets.crd.projectcalico.org created | |
clusterrole.rbac.authorization.ks.io/calico-kube-controllers created | |
clusterrolebinding.rbac.authorization.ks.io/calico-kube-controllers created | |
clusterrole.rbac.authorization.ks.io/calico-node created | |
clusterrolebinding.rbac.authorization.ks.io/calico-node created | |
daemonset.apps/calico-node created | |
serviceaccount/calico-node created | |
deployment.apps/calico-kube-controllers created | |
serviceaccount/calico-kube-controllers created | |
Warning: policy/vbeta1 PodDisruptionBudget is deprecated in v1.21+, unavailable in v1.25+; use policy/v1 PodDisruptionBudget | |
poddisruptionbudget.policy/calico-kube-controllers created |
现在集群才是完全正常的
[root@kscloude1 ~]# kubectl get nodes | |
NAME STATUS ROLES AGE VERSION | |
kscloude1 Ready control-plane,master 9m11s v1.21.0 | |
kscloude2 Ready <none> 5m14s v1.21.0 | |
kscloude3 Ready <none> 4m44s v1.21.0 |
注意:如果k8s master节点没有执行kubeadm reset重置命令,只是重置了worker节点,则不需要重新安装calico
[root@kscloude1 ~]# kubectl get pods -n kube-system -o wide | |
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES | |
calico-kube-controllers-b9fbfff44-4jzkj 1/1 Running 0 3m16s 10.244.251.193 k8scloude3 <none> <none> | |
calico-node-bdlgm/1 Running 0 3m16s 192.168.110.130 k8scloude1 <none> <none> | |
calico-node-hxbk 1/1 Running 0 3m16s 192.168.110.128 k8scloude3 <none> <none> | |
calico-node-nsbfs/1 Running 0 3m16s 192.168.110.129 k8scloude2 <none> <none> | |
coredns-d6fc579-7wm95 1/1 Running 0 11m 10.244.158.65 k8scloude1 <none> <none> | |
coredns-d6fc579-87q8j 1/1 Running 0 11m 10.244.158.66 k8scloude1 <none> <none> | |
etcd-kscloude1 1/1 Running 0 12m 192.168.110.130 k8scloude1 <none> <none> | |
kube-apiserver-kscloude1 1/1 Running 0 12m 192.168.110.130 k8scloude1 <none> <none> | |
kube-controller-manager-kscloude1 1/1 Running 0 12m 192.168.110.130 k8scloude1 <none> <none> | |
kube-proxy-xh 1/1 Running 0 7m48s 192.168.110.128 k8scloude3 <none> <none> | |
kube-proxy-lpjz 1/1 Running 0 8m18s 192.168.110.129 k8scloude2 <none> <none> | |
kube-proxy-zxlk 1/1 Running 0 11m 192.168.110.130 k8scloude1 <none> <none> | |
kube-scheduler-kscloude1 1/1 Running 0 12m 192.168.110.130 k8scloude1 <none> <none> |
自此,k8s集群重装完成!