国内 CentOS 7 安装 K8S v1.29.2(CRI:containerd)

50 0 1

通过安装 K8S，了解 K8S 的核心概念：控制面、CRI、CNI、Deployment、Service、sandbox 等，本文不仅包含安装流程，而且包含丰富的 Troubeshooting 实战，以及解释这背后发生了什么。

前置项

端口占用情况

kubelet 10250 10248
kube-controller 127.0.0.1:10257
kube-schedule 10259
kube-proxy 10256 127.0.0.1:10249
kube-apiserver 6443
etcd 2379 2380 2381

安装环境

CentOS 7.6

架构

详见 Kubernetes 架构
国内 CentOS 7 安装 K8S v1.29.2(CRI:containerd)

服务器2台：1台控制面，1台数据面

1. 安装 kubeadm

参考官方文档安装 kubeadm

1.1 安装容器运行时 containerd

所有节点中安装容器运行时

容器运行时有好几个 containerd、CRI-O、Docker Engine（使用 cri-dockerd），这里选择containerd

参照文档 Getting started with containerd

containerd 调用链更短，组件更少，更稳定，支持OCI标准，占用节点资源更少。建议选择 containerd。
以下情况，请选择 docker 作为运行时组件：
如需使用 docker in docker
如需在 K8S 节点使用 docker build/push/save/load 等命令
如需调用 docker API
如需 docker compose 或 docker swarm

前置条件：转发 IPv4 并让 iptables 看到桥接流量

cat <<EOF | sudo tee /etc/modules-load.d/k8s.conf
overlay
br_netfilter
EOF

sudo modprobe overlay
sudo modprobe br_netfilter

# 设置所需的 sysctl 参数，参数在重新启动后保持不变
cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-iptables  = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.ipv4.ip_forward                 = 1
EOF

# 应用 sysctl 参数而不重新启动
sudo sysctl --system

通过运行以下指令确认 br_netfilter 和 overlay 模块被加载：

lsmod | grep br_netfilter
lsmod | grep overlay

通过运行以下指令确认 net.bridge.bridge-nf-call-iptables、net.bridge.bridge-nf-call-ip6tables 和 net.ipv4.ip_forward 系统变量在你的 sysctl 配置中被设置为 1：

sysctl net.bridge.bridge-nf-call-iptables net.bridge.bridge-nf-call-ip6tables net.ipv4.ip_forward

1.1 安装 containerd

containerd 和 runc 的关系
在提供的文档中，containerd 和 runc 的关系主要体现在 containerd 作为容器运行时的高级管理层，而 runc 则是底层的容器运行时。containerd 提供了一系列的功能，包括但不限于容器的生命周期管理、镜像管理、存储和网络配置等。而 runc 是一个轻量级的容器运行时，它实现了 Open Container Initiative (OCI) 规范，负责直接与操作系统交互，执行容器。

在文档中，通过使用 containerd 的 CLI 工具 ctr 来演示如何与 containerd 交互，包括拉取镜像、创建容器和任务等操作。这些操作背后，containerd 会调用 runc 来实际启动和运行容器。这样的设计允许 containerd 提供更高级的功能，同时利用 runc 的轻量级和遵循标准的特性来执行容器。

简而言之，containerd 和 runc 的关系可以看作是分层的：containerd 提供了丰富的管理功能和用户接口，而 runc 负责底层的容器执行，两者共同协作，提供了一个完整的容器运行时环境。

前往 https://github.com/containerd/containerd/releases 下载 containerd 下载包

## 解压到 /usr/local 目录下
tar Czxvf /usr/local/  containerd-1.7.14-linux-amd64.tar.gz

使用 systemd 管理 containerd ,将配置文件存放至 /usr/lib/systemd/system/ 下

wget -O /usr/lib/systemd/system/containerd.service https://raw.githubusercontent.com/containerd/containerd/main/containerd.service

systemctl daemon-reload
systemctl enable --now containerd

之后找到 Unix 域套接字，安装成功

ll /var/run/containerd/containerd.sock
srw-rw---- 1 root root 0 Mar 13 11:21 /var/run/containerd/containerd.sock

当使用 containerd 作为容器运行时后，拉取镜像会使用 ctr 命令完成，请查看文末的 ctr 拉取镜像或查看容器列表

小知识：containerd 与 containerd-shim-runc-v2 的关系

[root@hadoop-30 certs.d]# ps -ef | grep containerd
root      274213       1  0 3月16 ?       00:00:09 /usr/local/bin/containerd-shim-runc-v2 -namespace moby -id 500ee902a17e8190a237da13ae0f0bc97d65093d8871125a49b20c6be1058ac1 -address /run/containerd/containerd.sock
root     1453036       1  0 3月17 ?       00:03:15 /usr/local/bin/containerd --config /etc/containerd/config.toml

containerd

containerd是一个开源的容器运行时，它是Docker的核心组件之一，但也可以独立于Docker使用。它负责管理容器的整个生命周期，包括容器的创建、执行、暂停、停止、删除以及管理容器的镜像、存储和网络。containerd提供了一个完整的容器运行时环境，但设计上保持了足够的轻量级和模块化，使得它可以容易地集成到更大的系统中，比如Kubernetes。

在你提供的进程列表中，containerd运行着一个守护进程（daemon），通常通过/usr/local/bin/containerd命令启动。--config /etc/containerd/config.toml参数指定了配置文件的位置，这个文件包含了containerd的配置信息。

containerd-shim-runc-v2

containerd-shim-runc-v2是containerd的一部分，它是一个容器运行时接口（container runtime interface，CRI）的实现，用于创建和运行容器。shim是一个轻量级的中间组件，它允许containerd与不同的容器运行时（如runc或其他兼容OCI（Open Container Initiative）标准的运行时）进行交互。

shim的主要目的是在后台处理容器的执行，而不需要containerd守护进程持续运行。这样一来，即使containerd守护进程停止或崩溃，容器也可以继续运行。shim进程还负责收集容器的输出（如日志）并将其转发回containerd。

在你的进程列表中，containerd-shim-runc-v2是运行中的多个实例，每个实例都与一个特定的容器关联。例如：

/usr/local/bin/containerd-shim-runc-v2 -namespace moby -id 500ee902... -address /run/containerd/containerd.sock

这里，-namespace moby参数指示shim实例是在Docker的命名空间下运行的，-id后面跟着的是容器的唯一标识符，-address指定了containerd守护进程的socket地址，shim通过这个地址与containerd通信。

总结来说，containerd是容器运行时环境的核心守护进程，而containerd-shim-runc-v2是它用来与各个容器运行时进行交互的轻量级中间件。

1.2 安装 runc

CLI tool for spawning and running containers according to the OCI specification
根据 OCI 规范生成和运行容器的 CLI 工具

wget https://github.com/opencontainers/runc/releases/download/v1.1.12/runc.amd64
install -m 755 runc.amd64 /usr/local/sbin/runc

1.3 安装 CNI plugins

# wget https://github.com/containernetworking/plugins/releases/download/v1.4.1/cni-plugins-linux-amd64-v1.4.1.tgz

# mkdir -p /opt/cni/bin
[root@clouderamanager-15 containerd]# tar Cxzvf /opt/cni/bin cni-plugins-linux-amd64-v1.4.1.tgz

#ll /opt/cni/bin/
total 128528
-rwxr-xr-x 1 1001  127  4119661 Mar 12 18:56 bandwidth
-rwxr-xr-x 1 1001  127  4662227 Mar 12 18:56 bridge
-rwxr-xr-x 1 1001  127 11065251 Mar 12 18:56 dhcp
-rwxr-xr-x 1 1001  127  4306546 Mar 12 18:56 dummy
-rwxr-xr-x 1 1001  127  4751593 Mar 12 18:56 firewall
-rwxr-xr-x 1 root root  2856252 Feb 21  2020 flannel
-rwxr-xr-x 1 1001  127  4198427 Mar 12 18:56 host-device
-rwxr-xr-x 1 1001  127  3560496 Mar 12 18:56 host-local
-rwxr-xr-x 1 1001  127  4324636 Mar 12 18:56 ipvlan
-rw-r--r-- 1 1001  127    11357 Mar 12 18:56 LICENSE
-rwxr-xr-x 1 1001  127  3651038 Mar 12 18:56 loopback
-rwxr-xr-x 1 1001  127  4355073 Mar 12 18:56 macvlan
-rwxr-xr-x 1 root root 37545270 Feb 21  2020 multus
-rwxr-xr-x 1 1001  127  4095898 Mar 12 18:56 portmap
-rwxr-xr-x 1 1001  127  4476535 Mar 12 18:56 ptp
-rw-r--r-- 1 1001  127     2343 Mar 12 18:56 README.md
-rwxr-xr-x 1 root root  2641877 Feb 21  2020 sample
-rwxr-xr-x 1 1001  127  3861176 Mar 12 18:56 sbr
-rwxr-xr-x 1 1001  127  3120090 Mar 12 18:56 static
-rwxr-xr-x 1 1001  127  4381887 Mar 12 18:56 tap
-rwxr-xr-x 1 root root  7506830 Aug 18  2021 tke-route-eni
-rwxr-xr-x 1 1001  127  3743844 Mar 12 18:56 tuning
-rwxr-xr-x 1 1001  127  4319235 Mar 12 18:56 vlan
-rwxr-xr-x 1 1001  127  4008392 Mar 12 18:56 vrf

1.2 安装 kubeadm、kubelet 和 kubectl

参考官方文档

你需要在每台机器上安装以下的软件包：

kubeadm：用来初始化集群的指令。
kubelet：在集群中的每个节点上用来启动 Pod 和容器等。
kubectl：用来与集群通信的命令行工具。

cat <<EOF | sudo tee /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://pkgs.k8s.io/core:/stable:/v1.29/rpm/
enabled=1
gpgcheck=1
gpgkey=https://pkgs.k8s.io/core:/stable:/v1.29/rpm/repodata/repomd.xml.key
exclude=kubelet kubeadm kubectl cri-tools kubernetes-cni
EOF

yum install -y kubelet kubeadm kubectl --disableexcludes=kubernetes
systemctl enable --now kubelet

1.3 配置 cgroup 驱动程序

参考文件：官方文档

存在两种驱动 cgroupfs、systemd，当 systemd 是初始化系统时，不推荐使用 cgroupfs 驱动，因为 systemd 期望系统上只有一个 cgroup 管理器。

当某个 Linux 系统发行版使用 systemd 作为其初始化系统时，初始化进程会生成并使用一个 root 控制组（cgroup），并充当 cgroup 管理器。

检查发行版是否将 systemd 作为初始化系统的方法：

在 CentOS 或任何其他使用 systemd 的 Linux 发行版中，您可以通过以下方法确认 systemd 是否为初始化系统：

检查 PID 1 的进程：
系统的初始化进程总是具有进程 ID（PID）1。通过检查 PID 1 的进程，您可以确定系统是否使用 systemd。使用以下命令：
```
ps -p 1
```
如果输出显示 PID 1 属于 systemd，那么 systemd 就是初始化系统。
使用 systemctl 命令：
因为 systemctl 是 systemd 的主要工具，如果您能够成功运行 systemctl 命令，那么您的系统很可能使用 systemd。例如：
```
systemctl
```
如果系统响应并显示服务和单元状态，则表明 systemd 在运行。
检查 /sbin/init 的链接：
/sbin/init 通常是初始化系统的符号链接。您可以检查它链接到什么来确定初始化系统：
```
ls -l /sbin/init
```
如果 /sbin/init 链接到了 systemd，那么您的系统使用的是 systemd。
检查 systemd 的 cgroup：
如果您想确认 systemd 是否作为 cgroup 管理器，可以查看 /sys/fs/cgroup/systemd 目录：
```
ls -l /sys/fs/cgroup/systemd
```
如果该目录存在并且包含许多与服务相关的文件和目录，那么 systemd 正在用作 cgroup 管理器。
查询系统服务管理器：
使用 hostnamectl 命令也可以提供当前运行的系统和服务管理器信息：
```
hostnamectl
```
在输出中，查找 "Operating System" 和 "Boot ID" 下方的行，它通常会告诉您系统是否运行 systemd。

通过以上任一方法，您都能够确认在 CentOS 上是否使用 systemd 作为初始化系统。

2. 使用 kubeadm 创建集群

参考官方文档使用 kubeadm 创建集群

Example usage:

Create a two-machine cluster with one control-plane node
(which controls the cluster), and one worker node
(where your workloads, like Pods and Deployments run).

┌──────────────────────────────────────────────────────────┐
│ On the first machine:                                    │
├──────────────────────────────────────────────────────────┤
│ control-plane# kubeadm init                              │
└──────────────────────────────────────────────────────────┘

┌──────────────────────────────────────────────────────────┐
│ On the second machine:                                   │
├──────────────────────────────────────────────────────────┤
│ worker# kubeadm join       │
└──────────────────────────────────────────────────────────┘

2.1 初始化控制平面节点

通过 kubeadm init 初始化控制平台节点

kubeadm init 的背后发生了什么？

kubeadm init 是 Kubernetes 安装过程中的一个命令，它用于初始化一个 Kubernetes 集群的控制平面。这个命令会执行一系列的步骤来启动一个新的集群。以下是 kubeadm init 命令背后的主要步骤：

选择容器运行时 (Container Runtime)

I0315 14:21:43.347964   22137 initconfiguration.go:122] detected and using CRI socket: unix:///var/run/containerd/containerd.sock

CRI（容器运行时接口）是 Kubernetes 用来与容器运行时进行交互的插件接口。常见的容器运行时包括 Docker、containerd 和 CRI-O 等。

在初始化 Kubernetes 集群时，kubeadm 需要确定要使用哪个容器运行时。它这样做的方式如下：

自动检测: kubeadm 会在默认的路径下自动检测是否存在已知的 CRI 套接字文件。这些路径通常包括 /var/run/dockershim.sock (对于 Docker), /var/run/containerd/containerd.sock (对于 containerd), 和 /var/run/crio/crio.sock (对于 CRI-O)。
配置文件或命令行参数: 如果管理员希望指定使用特定的 CRI，可以在 kubeadm 的配置文件中指定 nodeRegistration.criSocket 字段，或者使用 --cri-socket 命令行参数来明确指定套接字的路径。

在你提供的命令输出中，kubeadm 检测到并使用了位于 unix:///var/run/containerd/containerd.sock 的 CRI 套接字，这意味着它选择了 containerd 作为容器运行时。这是因为 kubeadm 在默认的套接字路径中找到了 containerd.sock 文件，并且没有通过配置文件或命令行参数指定其他的运行时。

如果你的系统上安装了多个容器运行时，并且你想要 kubeadm 使用非默认的运行时，你需要在初始化命令中添加 --cri-socket 参数来指定你想要的 CRI 套接字路径。例如：

kubeadm init --cri-socket /var/run/crio/crio.sock ...

这个命令会告诉 kubeadm 使用 CRI-O 作为容器运行时。如果没有指定，kubeadm 就会使用它自动检测到的第一个运行时。

检查 cgroup 驱动程序

I0315 14:21:43.348406   22137 kubelet.go:196] the value of KubeletConfiguration.cgroupDriver is empty; setting it to "systemd"

预检查（Preflight Checks）:
- 检查是否以 root 用户身份运行。
- 确认机器满足最低要求，例如 CPU、内存等。
- 检查是否安装了所需的依赖，如 Docker 或其他容器运行时、kubelet、kubeadm。
- 检查网络配置，确保节点间的通信是可行的。
- 检查端口是否开放，例如 Kubernetes API server 默认的 6443 端口。
- 确认系统的主机名、MAC 地址和产品 uuid 是唯一的。

validating Kubernetes and kubeadm version
validating if the firewall is enabled and active
validating availability of port 6443
validating availability of port 10259
validating availability of port 10257
validating the existence of file /etc/kubernetes/manifests/kube-apiserver.yaml
validating the existence of file /etc/kubernetes/manifests/kube-controller-manager.yaml
validating the existence of file /etc/kubernetes/manifests/kube-scheduler.yaml
validating the existence of file /etc/kubernetes/manifests/etcd.yaml
validating if the connectivity type is via proxy or direct
validating http connectivity to first IP address in the CIDR
validating http connectivity to first IP address in the CIDR
validating the container runtime
validating whether swap is enabled or not
validating the presence of executable crictl
validating the presence of executable conntrack
validating the presence of executable ip
validating the presence of executable iptables
validating the presence of executable mount
validating the presence of executable nsenter
validating the presence of executable ebtables
validating the presence of executable ethtool
validating the presence of executable socat
validating the presence of executable tc
validating the presence of executable touch
running all checks
checking whether the given node name is valid and reachable using net.LookupHost
validating kubelet version
validating if the "kubelet" service is enabled and active
validating availability of port 10250
validating the contents of file /proc/sys/net/bridge/bridge-nf-call-iptables
validating the contents of file /proc/sys/net/ipv4/ip_forward
validating availability of port 2379
validating availability of port 2380
validating the existence and emptiness of directory /var/lib/etcd

拉取镜像

[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
I0315 14:21:43.458945   22137 checks.go:828] using image pull policy: IfNotPresent
I0315 14:21:43.483244   22137 checks.go:846] image exists: registry.aliyuncs.com/google_containers/kube-apiserver:v1.29.2
I0315 14:21:43.507214   22137 checks.go:846] image exists: registry.aliyuncs.com/google_containers/kube-controller-manager:v1.29.2
I0315 14:21:43.530729   22137 checks.go:846] image exists: registry.aliyuncs.com/google_containers/kube-scheduler:v1.29.2
I0315 14:21:43.555095   22137 checks.go:846] image exists: registry.aliyuncs.com/google_containers/kube-proxy:v1.29.2
I0315 14:21:43.578368   22137 checks.go:846] image exists: registry.aliyuncs.com/google_containers/coredns:v1.11.1
W0315 14:21:43.601513   22137 checks.go:835] detected that the sandbox image "registry.k8s.io/pause:3.8" of the container runtime is inconsistent with that used by kubeadm. It is recommended that using "registry.aliyuncs.com/google_containers/pause:3.9" as the CRI sandbox image.
I0315 14:21:43.624418   22137 checks.go:846] image exists: registry.aliyuncs.com/google_containers/pause:3.9
I0315 14:21:43.648515   22137 checks.go:846] image exists: registry.aliyuncs.com/google_containers/etcd:3.5.10-0

生成证书（Certificate Generation）:
- 生成用于各种组件和通信加密的 TLS 证书，例如 API server、etcd、kubelet 等。
- 创建 CA（证书颁发机构）并使用它来签发所有其他证书。

[certs] Using certificateDir folder "/etc/kubernetes/pki"
I0315 14:21:43.648594   22137 certs.go:112] creating a new certificate authority for ca
[certs] Generating "ca" certificate and key
I0315 14:21:43.843075   22137 certs.go:519] validating certificate period for ca certificate
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [clouderamanager-15.com kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 10.0.0.15]
[certs] Generating "apiserver-kubelet-client" certificate and key
I0315 14:21:44.706100   22137 certs.go:112] creating a new certificate authority for front-proxy-ca
[certs] Generating "front-proxy-ca" certificate and key
I0315 14:21:45.004243   22137 certs.go:519] validating certificate period for front-proxy-ca certificate
[certs] Generating "front-proxy-client" certificate and key
I0315 14:21:45.245872   22137 certs.go:112] creating a new certificate authority for etcd-ca
[certs] Generating "etcd/ca" certificate and key
I0315 14:21:45.457730   22137 certs.go:519] validating certificate period for etcd/ca certificate
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [clouderamanager-15.com localhost] and IPs [10.0.0.15 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [clouderamanager-15.com localhost] and IPs [10.0.0.15 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
I0315 14:21:46.994972   22137 certs.go:78] creating new public/private key files for signing service account users
[certs] Generating "sa" key and public key

生成 Kubeconfig 文件:
- 为了能让 kubelet、controller-manager、scheduler 等组件与 API server 通信，kubeadm 会生成相应的 kubeconfig 文件。

[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
I0315 14:21:47.152235   22137 kubeconfig.go:112] creating kubeconfig file for admin.conf
[kubeconfig] Writing "admin.conf" kubeconfig file
I0315 14:21:47.413017   22137 kubeconfig.go:112] creating kubeconfig file for super-admin.conf
[kubeconfig] Writing "super-admin.conf" kubeconfig file
I0315 14:21:47.830913   22137 kubeconfig.go:112] creating kubeconfig file for kubelet.conf
[kubeconfig] Writing "kubelet.conf" kubeconfig file
I0315 14:21:48.061428   22137 kubeconfig.go:112] creating kubeconfig file for controller-manager.conf
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
I0315 14:21:48.542560   22137 kubeconfig.go:112] creating kubeconfig file for scheduler.conf
[kubeconfig] Writing "scheduler.conf" kubeconfig file

生成静态 Pod 清单:
- 生成用于控制平面组件的静态 Pod 清单，包括 API server、controller-manager、scheduler、etcd(如果没有使用外部的 etcd 集群)。
- 这些清单会放在 /etc/kubernetes/manifests 目录下，kubelet 会监控这个目录，自动启动这些 Pod。

[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
I0315 14:21:48.710218   22137 local.go:65] [etcd] wrote Static Pod manifest for a local etcd member to "/etc/kubernetes/manifests/etcd.yaml"
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
I0315 14:21:48.710250   22137 manifests.go:102] [control-plane] getting StaticPodSpecs
I0315 14:21:48.710451   22137 certs.go:519] validating certificate period for CA certificate
I0315 14:21:48.710540   22137 manifests.go:128] [control-plane] adding volume "ca-certs" for component "kube-apiserver"
I0315 14:21:48.710555   22137 manifests.go:128] [control-plane] adding volume "etc-pki" for component "kube-apiserver"
I0315 14:21:48.710565   22137 manifests.go:128] [control-plane] adding volume "k8s-certs" for component "kube-apiserver"
I0315 14:21:48.711485   22137 manifests.go:157] [control-plane] wrote static Pod manifest for component "kube-apiserver" to "/etc/kubernetes/manifests/kube-apiserver.yaml"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
I0315 14:21:48.711506   22137 manifests.go:102] [control-plane] getting StaticPodSpecs
I0315 14:21:48.711741   22137 manifests.go:128] [control-plane] adding volume "ca-certs" for component "kube-controller-manager"
I0315 14:21:48.711761   22137 manifests.go:128] [control-plane] adding volume "etc-pki" for component "kube-controller-manager"
I0315 14:21:48.711772   22137 manifests.go:128] [control-plane] adding volume "flexvolume-dir" for component "kube-controller-manager"
I0315 14:21:48.711782   22137 manifests.go:128] [control-plane] adding volume "k8s-certs" for component "kube-controller-manager"
I0315 14:21:48.711793   22137 manifests.go:128] [control-plane] adding volume "kubeconfig" for component "kube-controller-manager"
I0315 14:21:48.712608   22137 manifests.go:157] [control-plane] wrote static Pod manifest for component "kube-controller-manager" to "/etc/kubernetes/manifests/kube-controller-manager.yaml"
[control-plane] Creating static Pod manifest for "kube-scheduler"
I0315 14:21:48.712629   22137 manifests.go:102] [control-plane] getting StaticPodSpecs
I0315 14:21:48.712845   22137 manifests.go:128] [control-plane] adding volume "kubeconfig" for component "kube-scheduler"
I0315 14:21:48.713399   22137 manifests.go:157] [control-plane] wrote static Pod manifest for component "kube-scheduler" to "/etc/kubernetes/manifests/kube-scheduler.yaml"
I0315 14:21:48.713417   22137 kubelet.go:68] Stopping the kubelet

设置控制平面组件:
- 启动 Kubernetes 控制平面组件，包括 kube-apiserver、kube-controller-manager、kube-scheduler。
- 这些组件可以作为静态 Pod 运行，也可以作为系统服务运行。
创建 kubelet 配置:
- 为 kubelet 生成配置文件，通常位于 /var/lib/kubelet/config.yaml。
- 这个配置文件指定了 kubelet 连接到 API server 的参数。
标记控制平面节点:
- 使用标签和污点来标记控制平面节点，确保普通的工作负载不会调度到控制平面节点上。
安装核心组件:
- 安装必要的核心插件，如 CoreDNS 和 kube-proxy。

[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy

生成 admin.conf:
- 生成 admin.conf Kubeconfig 文件，以便管理员可以使用 kubectl 与集群交互。
输出结果:
- 打印出一条消息，说明如何加入更多的节点到集群，并给出 kubeadm join 命令的具体语法。
后续步骤提示:
- 提供后续步骤的提示，比如如何使用生成的 Kubeconfig 文件来运行 kubectl 命令。

执行 kubeadm init 命令时，你可以通过指定不同的标志和配置文件来自定义初始化过程。

开始安装

# kubeadm init
[init] Using Kubernetes version: v1.29.2
[preflight] Running pre-flight checks
        [WARNING HTTPProxy]: Connection to "https://10.0.0.15" uses proxy "http://10.0.0.15:27070". If that is not intended, adjust your proxy settings
        [WARNING HTTPProxyCIDR]: connection to "10.96.0.0/12" uses proxy "http://10.0.0.15:27070". This may lead to malfunctional cluster setup. Make sure that Pod and Services IP ranges specified correctly as exceptions in proxy configuration
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'

基本遇到镜像拉取失败，国内无法访问 K8S 默认的镜像仓库 registry.k8s.io

output: E0313 16:10:16.955851   26777 remote_image.go:180] "PullImage from image service failed" err="rpc error: code = DeadlineExceeded desc = failed to pull and unpack image \"registry.k8s.io/kube-apiserver:v1.29.2\": failed to resolve reference \"registry.k8s.io/kube-apiserver:v1.29.2\": failed to do request: Head \"https://us-west2-docker.pkg.dev/v2/k8s-artifacts-prod/images/kube-apiserver/manifests/v1.29.2\": dial tcp 142.251.170.82:443: i/o timeout" image="registry.k8s.io/kube-apiserver:v1.29.2"

指定镜像仓库为阿里云的 registry.aliyuncs.com/google_containers

kubeadm init --image-repository registry.aliyuncs.com/google_containers --kubernetes-version=v1.29.2 --v=5

发现仍然有报错

查看日志 journalctl -exfu kubelet，发现 kubelet 节点依赖 registry.k8s.io/pause:3.8，这个镜像不受 kubeadm init --image-repository 参数的影响。

3月 15 11:10:59 clouderamanager-15.com kubelet[16149]: E0315 11:10:59.973384   16149 kuberuntime_sandbox.go:72] "Failed to create sandbox for pod" err="rpc error: code = Unknown desc = failed to get sandbox image \"registry.k8s.io/pause:3.8\": f
3月 15 11:10:59 clouderamanager-15.com kubelet[16149]: E0315 11:10:59.973425   16149 kuberuntime_manager.go:1172] "CreatePodSandbox for pod failed" err="rpc error: code = Unknown desc = failed to get sandbox image \"registry.k8s.io/pause:3.8\

导入吧

docker save registry.k8s.io/pause:3.8 -o pause.tar
ctr -n k8s.io images import pause.tar   #注：默认会从k8s.io仓库内寻找所需镜像,仓库名不可修改，否则会初始化失败。
systemctl daemon-reload
systemctl restart kubelet

# ctr -n k8s.io images import pause.tar
unpacking registry.k8s.io/pause:3.9 (sha256:e4bb1cdb96c8a65d8c69352db24d4c1051baae44bbeb3f2ecd33153b7c2ca9ee)...done

小思考：kubelet 本身会使用 pause 镜像来做垃圾清理等事项，是通过 --pod-infra-container-image 来指定。这是一个 kubelet 运行时的进程参数： /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --container-runtime-endpoint=unix:///var/run/containerd/containerd.sock --pod-infra-container-image=registry.aliyuncs.com/google_containers/pause:3.9

重置环境，重新安装

[root@clouderamanager-15 containerd]# kubeadm reset
[reset] Reading configuration from the cluster...
[reset] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
W0315 14:21:08.255268   21480 reset.go:124] [reset] Unable to fetch the kubeadm-config ConfigMap from cluster: failed to get config map: configmaps "kubeadm-config" is forbidden: User "kubernetes-admin" cannot get resource "configmaps" in API group "" in the namespace "kube-system"
W0315 14:21:08.255400   21480 preflight.go:56] [reset] WARNING: Changes made to this host by 'kubeadm init' or 'kubeadm join' will be reverted.
[reset] Are you sure you want to proceed? [y/N]: y
[preflight] Running pre-flight checks
W0315 14:21:09.030378   21480 removeetcdmember.go:106] [reset] No kubeadm config, using etcd pod spec to get data directory
[reset] Deleted contents of the etcd data directory: /var/lib/etcd
[reset] Stopping the kubelet service
[reset] Unmounting mounted directories in "/var/lib/kubelet"
[reset] Deleting contents of directories: [/etc/kubernetes/manifests /var/lib/kubelet /etc/kubernetes/pki]
[reset] Deleting files: [/etc/kubernetes/admin.conf /etc/kubernetes/super-admin.conf /etc/kubernetes/kubelet.conf /etc/kubernetes/bootstrap-kubelet.conf /etc/kubernetes/controller-manager.conf /etc/kubernetes/scheduler.conf]

The reset process does not clean CNI configuration. To do so, you must remove /etc/cni/net.d

The reset process does not reset or clean up iptables rules or IPVS tables.
If you wish to reset iptables, you must do so manually by using the "iptables" command.

If your cluster was setup to utilize IPVS, run ipvsadm --clear (or similar)
to reset your system's IPVS tables.

The reset process does not clean your kubeconfig files and you must remove them manually.
Please, check the contents of the $HOME/.kube/config file.
[root@clouderamanager-15 containerd]# rm -rf $HOME/.kube/config
[root@clouderamanager-15 containerd]# rm -rf /var/lib/etcd

开始安装

[root@clouderamanager-15 containerd]# kubeadm init --image-repository registry.aliyuncs.com/google_containers --kubernetes-version=v1.29.2 --apiserver-advertise-address=0.0.0.0 --service-cidr=10.96.0.0/12 --pod-network-cidr=10.244.0.0/16 --v=5
I0315 14:21:43.347964   22137 initconfiguration.go:122] detected and using CRI socket: unix:///var/run/containerd/containerd.sock
I0315 14:21:43.348211   22137 interface.go:432] Looking for default routes with IPv4 addresses
I0315 14:21:43.348221   22137 interface.go:437] Default route transits interface "eth0"
I0315 14:21:43.348314   22137 interface.go:209] Interface eth0 is up
I0315 14:21:43.348349   22137 interface.go:257] Interface "eth0" has 3 addresses :[10.0.0.15/24 2402:4e00:1016:d700:0:968d:60f8:9786/64 fe80::5054:ff:feb2:f159/64].
I0315 14:21:43.348363   22137 interface.go:224] Checking addr  10.0.0.15/24.
I0315 14:21:43.348371   22137 interface.go:231] IP found 10.0.0.15
I0315 14:21:43.348383   22137 interface.go:263] Found valid IPv4 address 10.0.0.15 for interface "eth0".
I0315 14:21:43.348390   22137 interface.go:443] Found active IP 10.0.0.15
I0315 14:21:43.348406   22137 kubelet.go:196] the value of KubeletConfiguration.cgroupDriver is empty; setting it to "systemd"
[init] Using Kubernetes version: v1.29.2
[preflight] Running pre-flight checks
I0315 14:21:43.351789   22137 checks.go:563] validating Kubernetes and kubeadm version
I0315 14:21:43.351821   22137 checks.go:168] validating if the firewall is enabled and active
I0315 14:21:43.359788   22137 checks.go:203] validating availability of port 6443
I0315 14:21:43.359910   22137 checks.go:203] validating availability of port 10259
I0315 14:21:43.359941   22137 checks.go:203] validating availability of port 10257
I0315 14:21:43.359993   22137 checks.go:280] validating the existence of file /etc/kubernetes/manifests/kube-apiserver.yaml
I0315 14:21:43.360018   22137 checks.go:280] validating the existence of file /etc/kubernetes/manifests/kube-controller-manager.yaml
I0315 14:21:43.360034   22137 checks.go:280] validating the existence of file /etc/kubernetes/manifests/kube-scheduler.yaml
I0315 14:21:43.360046   22137 checks.go:280] validating the existence of file /etc/kubernetes/manifests/etcd.yaml
I0315 14:21:43.360079   22137 checks.go:430] validating if the connectivity type is via proxy or direct
I0315 14:21:43.360117   22137 checks.go:469] validating http connectivity to first IP address in the CIDR
I0315 14:21:43.360142   22137 checks.go:469] validating http connectivity to first IP address in the CIDR
I0315 14:21:43.360164   22137 checks.go:104] validating the container runtime
I0315 14:21:43.383947   22137 checks.go:639] validating whether swap is enabled or not
I0315 14:21:43.384026   22137 checks.go:370] validating the presence of executable crictl
I0315 14:21:43.384063   22137 checks.go:370] validating the presence of executable conntrack
I0315 14:21:43.384091   22137 checks.go:370] validating the presence of executable ip
I0315 14:21:43.384158   22137 checks.go:370] validating the presence of executable iptables
I0315 14:21:43.384185   22137 checks.go:370] validating the presence of executable mount
I0315 14:21:43.384210   22137 checks.go:370] validating the presence of executable nsenter
I0315 14:21:43.384235   22137 checks.go:370] validating the presence of executable ebtables
I0315 14:21:43.384257   22137 checks.go:370] validating the presence of executable ethtool
I0315 14:21:43.384282   22137 checks.go:370] validating the presence of executable socat
I0315 14:21:43.384335   22137 checks.go:370] validating the presence of executable tc
I0315 14:21:43.384364   22137 checks.go:370] validating the presence of executable touch
I0315 14:21:43.384391   22137 checks.go:516] running all checks
I0315 14:21:43.390968   22137 checks.go:401] checking whether the given node name is valid and reachable using net.LookupHost
I0315 14:21:43.391678   22137 checks.go:605] validating kubelet version
I0315 14:21:43.448373   22137 checks.go:130] validating if the "kubelet" service is enabled and active
I0315 14:21:43.458602   22137 checks.go:203] validating availability of port 10250
I0315 14:21:43.458680   22137 checks.go:329] validating the contents of file /proc/sys/net/bridge/bridge-nf-call-iptables
I0315 14:21:43.458741   22137 checks.go:329] validating the contents of file /proc/sys/net/ipv4/ip_forward
I0315 14:21:43.458769   22137 checks.go:203] validating availability of port 2379
I0315 14:21:43.458803   22137 checks.go:203] validating availability of port 2380
I0315 14:21:43.458833   22137 checks.go:243] validating the existence and emptiness of directory /var/lib/etcd
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
I0315 14:21:43.458945   22137 checks.go:828] using image pull policy: IfNotPresent
I0315 14:21:43.483244   22137 checks.go:846] image exists: registry.aliyuncs.com/google_containers/kube-apiserver:v1.29.2
I0315 14:21:43.507214   22137 checks.go:846] image exists: registry.aliyuncs.com/google_containers/kube-controller-manager:v1.29.2
I0315 14:21:43.530729   22137 checks.go:846] image exists: registry.aliyuncs.com/google_containers/kube-scheduler:v1.29.2
I0315 14:21:43.555095   22137 checks.go:846] image exists: registry.aliyuncs.com/google_containers/kube-proxy:v1.29.2
I0315 14:21:43.578368   22137 checks.go:846] image exists: registry.aliyuncs.com/google_containers/coredns:v1.11.1
W0315 14:21:43.601513   22137 checks.go:835] detected that the sandbox image "registry.k8s.io/pause:3.8" of the container runtime is inconsistent with that used by kubeadm. It is recommended that using "registry.aliyuncs.com/google_containers/pause:3.9" as the CRI sandbox image.
I0315 14:21:43.624418   22137 checks.go:846] image exists: registry.aliyuncs.com/google_containers/pause:3.9
I0315 14:21:43.648515   22137 checks.go:846] image exists: registry.aliyuncs.com/google_containers/etcd:3.5.10-0
[certs] Using certificateDir folder "/etc/kubernetes/pki"
I0315 14:21:43.648594   22137 certs.go:112] creating a new certificate authority for ca
[certs] Generating "ca" certificate and key
I0315 14:21:43.843075   22137 certs.go:519] validating certificate period for ca certificate
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [clouderamanager-15.com kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 10.0.0.15]
[certs] Generating "apiserver-kubelet-client" certificate and key
I0315 14:21:44.706100   22137 certs.go:112] creating a new certificate authority for front-proxy-ca
[certs] Generating "front-proxy-ca" certificate and key
I0315 14:21:45.004243   22137 certs.go:519] validating certificate period for front-proxy-ca certificate
[certs] Generating "front-proxy-client" certificate and key
I0315 14:21:45.245872   22137 certs.go:112] creating a new certificate authority for etcd-ca
[certs] Generating "etcd/ca" certificate and key
I0315 14:21:45.457730   22137 certs.go:519] validating certificate period for etcd/ca certificate
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [clouderamanager-15.com localhost] and IPs [10.0.0.15 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [clouderamanager-15.com localhost] and IPs [10.0.0.15 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
I0315 14:21:46.994972   22137 certs.go:78] creating new public/private key files for signing service account users
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
I0315 14:21:47.152235   22137 kubeconfig.go:112] creating kubeconfig file for admin.conf
[kubeconfig] Writing "admin.conf" kubeconfig file
I0315 14:21:47.413017   22137 kubeconfig.go:112] creating kubeconfig file for super-admin.conf
[kubeconfig] Writing "super-admin.conf" kubeconfig file
I0315 14:21:47.830913   22137 kubeconfig.go:112] creating kubeconfig file for kubelet.conf
[kubeconfig] Writing "kubelet.conf" kubeconfig file
I0315 14:21:48.061428   22137 kubeconfig.go:112] creating kubeconfig file for controller-manager.conf
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
I0315 14:21:48.542560   22137 kubeconfig.go:112] creating kubeconfig file for scheduler.conf
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
I0315 14:21:48.710218   22137 local.go:65] [etcd] wrote Static Pod manifest for a local etcd member to "/etc/kubernetes/manifests/etcd.yaml"
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
I0315 14:21:48.710250   22137 manifests.go:102] [control-plane] getting StaticPodSpecs
I0315 14:21:48.710451   22137 certs.go:519] validating certificate period for CA certificate
I0315 14:21:48.710540   22137 manifests.go:128] [control-plane] adding volume "ca-certs" for component "kube-apiserver"
I0315 14:21:48.710555   22137 manifests.go:128] [control-plane] adding volume "etc-pki" for component "kube-apiserver"
I0315 14:21:48.710565   22137 manifests.go:128] [control-plane] adding volume "k8s-certs" for component "kube-apiserver"
I0315 14:21:48.711485   22137 manifests.go:157] [control-plane] wrote static Pod manifest for component "kube-apiserver" to "/etc/kubernetes/manifests/kube-apiserver.yaml"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
I0315 14:21:48.711506   22137 manifests.go:102] [control-plane] getting StaticPodSpecs
I0315 14:21:48.711741   22137 manifests.go:128] [control-plane] adding volume "ca-certs" for component "kube-controller-manager"
I0315 14:21:48.711761   22137 manifests.go:128] [control-plane] adding volume "etc-pki" for component "kube-controller-manager"
I0315 14:21:48.711772   22137 manifests.go:128] [control-plane] adding volume "flexvolume-dir" for component "kube-controller-manager"
I0315 14:21:48.711782   22137 manifests.go:128] [control-plane] adding volume "k8s-certs" for component "kube-controller-manager"
I0315 14:21:48.711793   22137 manifests.go:128] [control-plane] adding volume "kubeconfig" for component "kube-controller-manager"
I0315 14:21:48.712608   22137 manifests.go:157] [control-plane] wrote static Pod manifest for component "kube-controller-manager" to "/etc/kubernetes/manifests/kube-controller-manager.yaml"
[control-plane] Creating static Pod manifest for "kube-scheduler"
I0315 14:21:48.712629   22137 manifests.go:102] [control-plane] getting StaticPodSpecs
I0315 14:21:48.712845   22137 manifests.go:128] [control-plane] adding volume "kubeconfig" for component "kube-scheduler"
I0315 14:21:48.713399   22137 manifests.go:157] [control-plane] wrote static Pod manifest for component "kube-scheduler" to "/etc/kubernetes/manifests/kube-scheduler.yaml"
I0315 14:21:48.713417   22137 kubelet.go:68] Stopping the kubelet
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Starting the kubelet
I0315 14:21:48.845893   22137 waitcontrolplane.go:83] [wait-control-plane] Waiting for the API server to be healthy
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[apiclient] All control plane components are healthy after 8.004355 seconds
I0315 14:21:56.854730   22137 kubeconfig.go:606] ensuring that the ClusterRoleBinding for the kubeadm:cluster-admins Group exists
I0315 14:21:56.856377   22137 kubeconfig.go:682] creating the ClusterRoleBinding for the kubeadm:cluster-admins Group by using super-admin.conf
I0315 14:21:56.870281   22137 uploadconfig.go:112] [upload-config] Uploading the kubeadm ClusterConfiguration to a ConfigMap
[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
I0315 14:21:56.885744   22137 uploadconfig.go:126] [upload-config] Uploading the kubelet component config to a ConfigMap
[kubelet] Creating a ConfigMap "kubelet-config" in namespace kube-system with the configuration for the kubelets in the cluster
I0315 14:21:56.903116   22137 uploadconfig.go:131] [upload-config] Preserving the CRISocket information for the control-plane node
I0315 14:21:56.903145   22137 patchnode.go:31] [patchnode] Uploading the CRI Socket information "unix:///var/run/containerd/containerd.sock" to the Node API object "clouderamanager-15.com" as an annotation
[upload-certs] Skipping phase. Please see --upload-certs
[mark-control-plane] Marking the node clouderamanager-15.com as control-plane by adding the labels: [node-role.kubernetes.io/control-plane node.kubernetes.io/exclude-from-external-load-balancers]
[mark-control-plane] Marking the node clouderamanager-15.com as control-plane by adding the taints [node-role.kubernetes.io/control-plane:NoSchedule]
[bootstrap-token] Using token: 0grr01.zbqxdtmuc5qd9d05
[bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles
[bootstrap-token] Configured RBAC rules to allow Node Bootstrap tokens to get nodes
[bootstrap-token] Configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstrap-token] Configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstrap-token] Configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[bootstrap-token] Creating the "cluster-info" ConfigMap in the "kube-public" namespace
I0315 14:21:57.971263   22137 clusterinfo.go:47] [bootstrap-token] loading admin kubeconfig
I0315 14:21:57.971832   22137 clusterinfo.go:58] [bootstrap-token] copying the cluster from admin.conf to the bootstrap kubeconfig
I0315 14:21:57.972121   22137 clusterinfo.go:70] [bootstrap-token] creating/updating ConfigMap in kube-public namespace
I0315 14:21:57.977561   22137 clusterinfo.go:84] creating the RBAC rules for exposing the cluster-info ConfigMap in the kube-public namespace
I0315 14:21:57.994965   22137 kubeletfinalize.go:91] [kubelet-finalize] Assuming that kubelet client certificate rotation is enabled: found "/var/lib/kubelet/pki/kubelet-client-current.pem"
[kubelet-finalize] Updating "/etc/kubernetes/kubelet.conf" to point to a rotatable kubelet client certificate and key
I0315 14:21:57.995934   22137 kubeletfinalize.go:135] [kubelet-finalize] Restarting the kubelet to enable client certificate rotation
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy

Your Kubernetes control-plane has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

Alternatively, if you are the root user, you can run:

  export KUBECONFIG=/etc/kubernetes/admin.conf

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/

Then you can join any number of worker nodes by running the following on each as root:

kubeadm join 10.0.0.15:6443 --token 0grr01.zbqxdtmuc5qd9d05 \
        --discovery-token-ca-cert-hash sha256:7942fdfd7e7e47318bc1b31f7ad8c1a05162b2292e706ad4c6c4b128abaa8e0b

good，初始化成功，准备连接集群的配置文件

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

警告：
kubeadm init 生成的 kubeconfig 文件 admin.conf 包含一个带有 Subject: O = > > kubeadm:cluster-admins, CN = kubernetes-admin 的证书。 kubeadm:cluster-admins 组被绑定到内置的 cluster-admin ClusterRole 上。不要与任何人共享 admin.conf 文件。

kubeadm init 生成另一个 kubeconfig 文件 super-admin.conf，其中包含带有 Subject: O = > system:masters, CN = kubernetes-super-admin 的证书。 system:masters 是一个紧急访问、超级用户组，可以绕过授权层（例如 RBAC）。不要与任何人共享 super-admin.conf 文件，建议将其移动到安全位置。

有关如何使用 kubeadm kubeconfig user 为其他用户生成 kubeconfig 文件，请参阅为其他用户生成 kubeconfig 文件。

检查运行情况，控制面安装完毕，帅的一比。😄

# kubectl  get pod -A
NAMESPACE     NAME                                             READY   STATUS    RESTARTS   AGE
kube-system   coredns-857d9ff4c9-rhljv                         1/1     Running   0          122m
kube-system   coredns-857d9ff4c9-wdhmh                         1/1     Running   0          122m
kube-system   etcd-clouderamanager-15.com                      1/1     Running   1          123m
kube-system   kube-apiserver-clouderamanager-15.com            1/1     Running   1          123m
kube-system   kube-controller-manager-clouderamanager-15.com   1/1     Running   1          123m
kube-system   kube-proxy-chw7d                                 1/1     Running   0          122m
kube-system   kube-scheduler-clouderamanager-15.com            1/1     Running   1          123m

#  kubectl get nodes
NAME                     STATUS   ROLES           AGE    VERSION
clouderamanager-15.com   Ready    control-plane   133m   v1.29.2

2.2 安装网络插件 CNI

参考官方文档

部署一个基于 Pod 网络插件的容器网络接口（CNI），以便你的 Pod 可以相互通信。

在 Kubernetes 集群中，Underlay 网络提供了物理的传输路径，CNI 插件则根据 Kubernetes 的要求配置和管理容器间的网络，Overlay 网络则是 CNI 插件可能使用的一种技术来实现跨越物理网络限制的容器通信。选择合适的 CNI 插件和是否使用 Overlay 网络通常取决于对性能、可扩展性、跨越不同云或数据中心的需求以及网络策略等因素。

我们选择一个 overlay 的插件，不需要云厂商支持。Flannel 是一个可以用于 Kubernetes 的 overlay 网络提供者。

Flannel 官方介绍
Flannel 是一种简单易用的方式，用于配置为 Kubernetes 设计的第三层网络结构。Flannel 在每个主机上运行一个小型的单一二进制代理程序 flanneld，负责从更大的预配置地址空间中为每个主机分配一个子网租约。Flannel 使用 Kubernetes API 或直接使用 etcd 来存储网络配置、分配的子网以及任何辅助数据（例如主机的公共 IP）。数据包通过几种后端机制之一进行转发，包括 VXLAN 和各种云集成。

像 Kubernetes 这样的平台假设集群内的每个容器（pod）都有一个唯一的、可路由的 IP。这种模型的优势在于它消除了共享单个主机 IP 时出现的端口映射复杂性。Flannel 负责在集群中多个节点之间提供第三层 IPv4 网络。Flannel 不控制容器如何连接到主机，只控制主机之间的流量传输方式。然而，flannel 为 Kubernetes 提供了一个 CNI 插件，并提供了与 Docker 集成的指导。

Flannel 专注于网络。对于网络策略，可以使用其他项目，例如 Calico。

wget https://github.com/flannel-io/flannel/releases/latest/download/kube-flannel.yml
kubectl apply -f kube-flannel.yml

默认情况下 kube-flannel.yml 中的 Network 与 kubeadm init --pod-network-cidr=10.244.0.0/16 一致，否则修改。

  net-conf.json: |
    {
      "Network": "10.244.0.0/16",
      "Backend": {
        "Type": "vxlan"
      }
    }

很好，flannel 部署成功。

[root@clouderamanager-15 k8s]# kubectl  get pod -A
NAMESPACE      NAME                                             READY   STATUS    RESTARTS   AGE
kube-flannel   kube-flannel-ds-xmlpp                            1/1     Running   0          109s

控制面部署成功，接下来部署节点。

3. 新增K8S节点

前置项

和控制面要求的一致，需要完成第一步 安装 kubeadm 中的所有内容。

[root@hadoop-30 bin]# kubeadm join 10.0.0.15:6443 --token 0grr01.zbqxdtmuc5qd9d05         --discovery-token-ca-cert-hash sha256:7942fdfd7e7e47318bc1b31f7ad8c1a05162b2292e706ad4c6c4b128abaa8e0b --v=5
I0315 22:19:01.131078 3367241 join.go:413] [preflight] found NodeName empty; using OS hostname as NodeName
I0315 22:19:01.131226 3367241 initconfiguration.go:122] detected and using CRI socket: unix:///var/run/containerd/containerd.sock
[preflight] Running pre-flight checks
I0315 22:19:01.131316 3367241 preflight.go:93] [preflight] Running general checks
I0315 22:19:01.131367 3367241 checks.go:280] validating the existence of file /etc/kubernetes/kubelet.conf
I0315 22:19:01.131385 3367241 checks.go:280] validating the existence of file /etc/kubernetes/bootstrap-kubelet.conf
I0315 22:19:01.131400 3367241 checks.go:104] validating the container runtime
I0315 22:19:01.154743 3367241 checks.go:639] validating whether swap is enabled or not
I0315 22:19:01.154817 3367241 checks.go:370] validating the presence of executable crictl
I0315 22:19:01.154847 3367241 checks.go:370] validating the presence of executable conntrack
I0315 22:19:01.154865 3367241 checks.go:370] validating the presence of executable ip
I0315 22:19:01.154881 3367241 checks.go:370] validating the presence of executable iptables
I0315 22:19:01.154899 3367241 checks.go:370] validating the presence of executable mount
I0315 22:19:01.154920 3367241 checks.go:370] validating the presence of executable nsenter
I0315 22:19:01.154939 3367241 checks.go:370] validating the presence of executable ebtables
I0315 22:19:01.154962 3367241 checks.go:370] validating the presence of executable ethtool
I0315 22:19:01.154980 3367241 checks.go:370] validating the presence of executable socat
I0315 22:19:01.155003 3367241 checks.go:370] validating the presence of executable tc
I0315 22:19:01.155020 3367241 checks.go:370] validating the presence of executable touch
I0315 22:19:01.155043 3367241 checks.go:516] running all checks
I0315 22:19:01.161426 3367241 checks.go:401] checking whether the given node name is valid and reachable using net.LookupHost
I0315 22:19:01.161631 3367241 checks.go:605] validating kubelet version
I0315 22:19:01.209791 3367241 checks.go:130] validating if the "kubelet" service is enabled and active
I0315 22:19:01.220464 3367241 checks.go:203] validating availability of port 10250
I0315 22:19:01.220657 3367241 checks.go:280] validating the existence of file /etc/kubernetes/pki/ca.crt
I0315 22:19:01.220674 3367241 checks.go:430] validating if the connectivity type is via proxy or direct
I0315 22:19:01.220710 3367241 checks.go:329] validating the contents of file /proc/sys/net/bridge/bridge-nf-call-iptables
I0315 22:19:01.220763 3367241 checks.go:329] validating the contents of file /proc/sys/net/ipv4/ip_forward
I0315 22:19:01.220785 3367241 join.go:532] [preflight] Discovering cluster-info
I0315 22:19:01.220816 3367241 token.go:80] [discovery] Created cluster-info discovery client, requesting info from "10.0.0.15:6443"
I0315 22:19:01.229835 3367241 token.go:118] [discovery] Requesting info from "10.0.0.15:6443" again to validate TLS against the pinned public key
I0315 22:19:01.237538 3367241 token.go:135] [discovery] Cluster info signature and contents are valid and TLS certificate validates against pinned roots, will use API Server "10.0.0.15:6443"
I0315 22:19:01.237581 3367241 discovery.go:52] [discovery] Using provided TLSBootstrapToken as authentication credentials for the join process
I0315 22:19:01.237601 3367241 join.go:546] [preflight] Fetching init configuration
I0315 22:19:01.237611 3367241 join.go:592] [preflight] Retrieving KubeConfig objects
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
I0315 22:19:01.245935 3367241 kubeproxy.go:55] attempting to download the KubeProxyConfiguration from ConfigMap "kube-proxy"
I0315 22:19:01.249510 3367241 kubelet.go:74] attempting to download the KubeletConfiguration from ConfigMap "kubelet-config"
I0315 22:19:01.254209 3367241 initconfiguration.go:114] skip CRI socket detection, fill with the default CRI socket unix:///var/run/containerd/containerd.sock
I0315 22:19:01.254532 3367241 interface.go:432] Looking for default routes with IPv4 addresses
I0315 22:19:01.254562 3367241 interface.go:437] Default route transits interface "eth0"
I0315 22:19:01.254959 3367241 interface.go:209] Interface eth0 is up
I0315 22:19:01.255059 3367241 interface.go:257] Interface "eth0" has 2 addresses :[10.0.0.30/24 fe80::5054:ff:fe66:3a5/64].
I0315 22:19:01.255095 3367241 interface.go:224] Checking addr  10.0.0.30/24.
I0315 22:19:01.255107 3367241 interface.go:231] IP found 10.0.0.30
I0315 22:19:01.255116 3367241 interface.go:263] Found valid IPv4 address 10.0.0.30 for interface "eth0".
I0315 22:19:01.255126 3367241 interface.go:443] Found active IP 10.0.0.30
I0315 22:19:01.258165 3367241 preflight.go:104] [preflight] Running configuration dependant checks
I0315 22:19:01.258183 3367241 controlplaneprepare.go:225] [download-certs] Skipping certs download
I0315 22:19:01.258197 3367241 kubelet.go:121] [kubelet-start] writing bootstrap kubelet config file at /etc/kubernetes/bootstrap-kubelet.conf
I0315 22:19:01.258735 3367241 kubelet.go:136] [kubelet-start] writing CA certificate at /etc/kubernetes/pki/ca.crt
I0315 22:19:01.259297 3367241 kubelet.go:157] [kubelet-start] Checking for an existing Node in the cluster with name "hadoop-30.com" and status "Ready"
I0315 22:19:01.261605 3367241 kubelet.go:172] [kubelet-start] Stopping the kubelet
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...
I0315 22:19:02.377520 3367241 cert_rotation.go:137] Starting client certificate rotation controller
I0315 22:19:02.378405 3367241 kubelet.go:220] [kubelet-start] preserving the crisocket information for the node
I0315 22:19:02.378425 3367241 patchnode.go:31] [patchnode] Uploading the CRI Socket information "unix:///var/run/containerd/containerd.sock" to the Node API object "hadoop-30.com" as an annotation

This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.

Run 'kubectl get nodes' on the control-plane to see this node join the cluster.

good，节点加入集群成功 😄

如果 token 过期，会报错 The cluster-info ConfigMap does not yet contain a JWS signature for token ID，则使用 kubeadm token create 重新获取。

获取 CA 证书的哈希值: openssl x509 -pubkey -in /etc/kubernetes/pki/ca.crt | openssl rsa -pubin -outform der 2>/dev/null | openssl dgst -sha256 -hex | sed 's/^.* //'

4. 验证

创建 deployment

创建一个 nginx 的 deploy，部署 nginx 1.22 版本。

[root@hadoop-30 certs.d]# kubectl  create deploy nginx-web --image=nginx:1.22
deployment.apps/nginx-web created

[root@hadoop-30 certs.d]# kubectl  get deploy nginx-web -o yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  annotations:
    deployment.kubernetes.io/revision: "1"
  generation: 1
  labels:
    app: nginx-web
  name: nginx-web
  namespace: default
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      app: nginx-web
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      creationTimestamp: null
      labels:
        app: nginx-web
    spec:
      containers:
      - image: nginx:1.22
        imagePullPolicy: IfNotPresent
        name: nginx
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30

[root@hadoop-30 certs.d]# kubectl  get deploy nginx-web -o wide
NAME        READY   UP-TO-DATE   AVAILABLE   AGE   CONTAINERS   IMAGES       SELECTOR
nginx-web   1/1     1            1           25s   nginx        nginx:1.22   app=nginx-web

[root@hadoop-30 certs.d]# kubectl  get pod -l app=nginx-web -o wide
NAME                         READY   STATUS    RESTARTS   AGE   IP            NODE            NOMINATED NODE   READINESS GATES
nginx-web-65d5f4d459-ptt46   1/1     Running   0          99s   10.244.1.47   hadoop-30.com   <none>           <none>
[root@hadoop-30 certs.d]# curl -I 10.244.1.47
HTTP/1.1 200 OK
Server: nginx/1.22.1
Date: Mon, 18 Mar 2024 01:37:10 GMT
Content-Type: text/html
Content-Length: 615
Last-Modified: Wed, 19 Oct 2022 08:02:20 GMT
Connection: keep-alive
ETag: "634faf0c-267"
Accept-Ranges: bytes

从上面的结果来看，直接访问 Pod IP 返回了 Nginx 的版本 Server: nginx/1.22.1，而 Pod IP 10.244.1.47 是 2.1 安装控制平面（kubeadm init --pod-network-cidr=10.244.0.0/16）和 2.2 章节安装 CNI插件 时指定的容器网络。

接下来，给这个 deployment 创建 service

创建 Service

[root@hadoop-30 containerd]# kubectl  expose deploy nginx-web --port=80 --target-port=80
service/nginx-web exposed

[root@hadoop-30 containerd]#  kubectl  get services -l app=nginx-web -o wide
NAME        TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)   AGE   SELECTOR
nginx-web   ClusterIP   10.105.151.200   <none>        80/TCP    17m   app=nginx-web

[root@hadoop-30 containerd]# curl 10.105.151.200 -I
HTTP/1.1 200 OK
Server: nginx/1.22.1
Date: Mon, 18 Mar 2024 02:40:24 GMT
Content-Type: text/html
Content-Length: 615
Last-Modified: Wed, 19 Oct 2022 08:02:20 GMT
Connection: keep-alive
ETag: "634faf0c-267"
Accept-Ranges: bytes

值得注意的是，Service的 IP地址 10.105.151.200 属于 kubeadm init --service-cidr=10.96.0.0/12 指定的 10.96.0.0/12 网段。

恭喜，K8S 终于安装成功了，😄

FAQ

/proc/sys/net/bridge/bridge-nf-call-iptables does not exist

使用 kubeadm init --image-repository registry.aliyuncs.com/google_containers --kubernetes-version=v1.29.2 --v=5 安装 k8s 时出现如下报错

[preflight] Some fatal errors occurred:
        [ERROR FileContent--proc-sys-net-bridge-bridge-nf-call-iptables]: /proc/sys/net/bridge/bridge-nf-call-iptables does not exist

原因：系统没有正确加载 br_netfilter 模块，这个文件是用来控制 Linux 内核是否将桥接流量交给 iptables 处理的。当使用 Linux 桥接网络和网络插件（如 Flannel, Calico 等）时，需要确保这个设置被启用。

解决办法：查看 1.1 章节。

containerd 的 ctr 命令常见用法

查看运行的容器

# ctr -n k8s.io containers list
CONTAINER                                                           IMAGE                                                                      RUNTIME
03ac29637768e33c33661069599e1ee90f6c3efe130bcd805b64a1bbb100a4e9    registry.k8s.io/pause:3.8                                                  io.containerd.runc.v2
1f76eb01a2fb00fcae2b4ce7d2440c3d2f77b6cd3274d1e7288fcf59acd52f49    registry.aliyuncs.com/google_containers/kube-apiserver:v1.29.2             io.containerd.runc.v2
39312b8fd01e75414b90b6c78597979ac3787d9ac9438944e6e72745eccfa922    registry.aliyuncs.com/google_containers/kube-proxy:v1.29.2                 io.containerd.runc.v2
b6886c1ff5b2dfb2be273e16116f0e84a852bf5584eb061ab517079e31e51e1d    registry.aliyuncs.com/google_containers/kube-scheduler:v1.29.2             io.containerd.runc.v2
ef777548fb9e8cd8df9b061554d9f04b203fc01037566631cde2e802394b3a76    registry.aliyuncs.com/google_containers/kube-controller-manager:v1.29.2    io.containerd.runc.v2
f811d268ea3392bc78855b9c78f8cfa81561939961d5d1ed05868d9fe1077b63    registry.aliyuncs.com/google_containers/etcd:3.5.10-0                      io.containerd.runc.v2

查看镜像列表

# ctr -n k8s.io images list
REF                                                                                                                                     TYPE                                                      DIGEST                                                                  SIZE      PLATFORMS                                                                    LABELS
registry.aliyuncs.com/google_containers/coredns:v1.11.1                                                                                 application/vnd.docker.distribution.manifest.list.v2+json sha256:a6b67bdb2a6750b591e6b07fac29653fc82ee964e5fc53baf4c1ad3f944b655a 17.3 MiB  linux/amd64,linux/arm/v7,linux/arm64,linux/ppc64le,linux/riscv64,linux/s390x io.cri-containerd.image=managed
registry.aliyuncs.com/google_containers/etcd:3.5.10-0                                                                                   application/vnd.docker.distribution.manifest.list.v2+json sha256:3b6a879b9db7fc31ae50662cc154cc52c986f9937f1fbb1281432b03fea50ad5 54.0 MiB  linux/amd64,linux/arm/v7,linux/arm64,linux/ppc64le,linux/s390x,windows/amd64 io.cri-containerd.image=managed
registry.aliyuncs.com/google_containers/kube-apiserver:v1.29.2                                                                          application/vnd.docker.distribution.manifest.list.v2+json sha256:c734b64f0a87a902ee168a5ed3e2ad6beef4d47dabaa1067fbfe0cdcc635ea00 33.4 MiB  linux/amd64,linux/arm64,linux/ppc64le,linux/s390x                            io.cri-containerd.image=managed
registry.aliyuncs.com/google_containers/kube-controller-manager:v1.29.2                                                                 application/vnd.docker.distribution.manifest.list.v2+json sha256:63ff6c3973153def10fd0aceea4f81197bfb18dfcdd318c810e98e370e839124 31.9 MiB  linux/amd64,linux/arm64,linux/ppc64le,linux/s390x                            io.cri-containerd.image=managed
registry.aliyuncs.com/google_containers/kube-proxy:v1.29.2                                                                              application/vnd.docker.distribution.manifest.list.v2+json sha256:e32aa8045573d6a256a1101f88a16969c20edfeb4d13eaf2264437214f6102c8 27.1 MiB  linux/amd64,linux/arm64,linux/ppc64le,linux/s390x                            io.cri-containerd.image=managed
registry.aliyuncs.com/google_containers/kube-scheduler:v1.29.2                                                                          application/vnd.docker.distribution.manifest.list.v2+json sha256:57451191c0bdc31619cc59351bea77d325eeddf647fd95c4764594e61c8935d7 17.7 MiB  linux/amd64,linux/arm64,linux/ppc64le,linux/s390x                            io.cri-containerd.image=managed
registry.aliyuncs.com/google_containers/pause:3.9                                                                                       application/vnd.docker.distribution.manifest.list.v2+json sha256:7031c1b283388d2c2e09b57badb803c05ebed362dc88d84b480cc47f72a21097 314.0 KiB linux/amd64,linux/arm/v7,linux/arm64,linux/ppc64le,linux/s390x,windows/amd64 io.cri-containerd.image=managed

查看容器节点运行时

# kubectl describe node hadoop-30.com | grep Runtime
  Container Runtime Version:  containerd://1.7.14

在另一个老的集群中，运行时还是 Docker。

# kubectl  describe node 10.1.0.16 | grep Runtime
  Container Runtime Version:  docker://18.6.3-ce-tke.4

查看节点上保存的镜像

通过 kubectl get node clouderamanager-15.com -o yaml 命令，我们看到在 kubeadm init 安装阶段指定的 --image-repository registry.aliyuncs.com/google_containers 生效了。

# kubectl  get node clouderamanager-15.com -o yaml
apiVersion: v1
kind: Node
...
  images:
  - names:
    - registry.aliyuncs.com/google_containers/etcd@sha256:3b6a879b9db7fc31ae50662cc154cc52c986f9937f1fbb1281432b03fea50ad5
    - registry.aliyuncs.com/google_containers/etcd:3.5.10-0
    sizeBytes: 56648498
  - names:
    - registry.aliyuncs.com/google_containers/kube-apiserver@sha256:c734b64f0a87a902ee168a5ed3e2ad6beef4d47dabaa1067fbfe0cdcc635ea00
    - registry.aliyuncs.com/google_containers/kube-apiserver:v1.29.2
    sizeBytes: 35071035
  - names:
    - registry.aliyuncs.com/google_containers/kube-controller-manager@sha256:63ff6c3973153def10fd0aceea4f81197bfb18dfcdd318c810e98e370e839124
    - registry.aliyuncs.com/google_containers/kube-controller-manager:v1.29.2
    sizeBytes: 33430393
  - names:
    - docker.io/flannel/flannel@sha256:452061a392663283672e905be10762e142d7ad6126ddee7b772e14405ee79a6a
    - docker.io/flannel/flannel:v0.24.3
    sizeBytes: 30382976
  - names:
    - registry.aliyuncs.com/google_containers/kube-proxy@sha256:e32aa8045573d6a256a1101f88a16969c20edfeb4d13eaf2264437214f6102c8
    - registry.aliyuncs.com/google_containers/kube-proxy:v1.29.2
    sizeBytes: 28365243
  - names:
    - registry.aliyuncs.com/google_containers/kube-scheduler@sha256:57451191c0bdc31619cc59351bea77d325eeddf647fd95c4764594e61c8935d7
    - registry.aliyuncs.com/google_containers/kube-scheduler:v1.29.2
    sizeBytes: 18522106
  - names:
    - registry.aliyuncs.com/google_containers/coredns@sha256:a6b67bdb2a6750b591e6b07fac29653fc82ee964e5fc53baf4c1ad3f944b655a
    - registry.aliyuncs.com/google_containers/coredns:v1.11.1
    sizeBytes: 18182351
  - names:
    - docker.io/flannel/flannel-cni-plugin@sha256:743c25e5e477527d8e54faa3e5259fbbee3463a335de1690879fc74305edc79b
    - docker.io/flannel/flannel-cni-plugin:v1.4.0-flannel1
    sizeBytes: 4498296
  - names:
    - registry.k8s.io/pause:3.8
    sizeBytes: 714610
  - names:
    - registry.aliyuncs.com/google_containers/pause@sha256:7031c1b283388d2c2e09b57badb803c05ebed362dc88d84b480cc47f72a21097
    - registry.aliyuncs.com/google_containers/pause:3.9
    - registry.k8s.io/pause:3.9
    sizeBytes: 321520

设置 containerd 的镜像加速地址

首先确认 containerd 的配置文件路径，一般是 /etc/containerd/config.toml

有两种方式确认

ps aux | grep containerd 确认启动参数中是否包含 --config
systemctl cat containerd 确认守护进程的配置文件，比如 /usr/lib/systemd/system/containerd.service

如果 ExecStart=/usr/local/bin/containerd 没有带启动参数，可以改成 ExecStart=/usr/local/bin/containerd --config /etc/containerd/config.toml

添加镜像加速地址

如果 /etc/containerd/config.toml 内容为空，使用下面命令生成默认配置。

containerd config default > /etc/containerd/config.toml

然后在 /etc/containerd/config.toml 中找到 [plugins."io.containerd.grpc.v1.cri".registry.mirrors] ，添加配置效果如下：

[plugins."io.containerd.grpc.v1.cri".registry]
  [plugins."io.containerd.grpc.v1.cri".registry.mirrors]
    [plugins."io.containerd.grpc.v1.cri".registry.mirrors."docker.io"]
      endpoint = ["https://mirror.ccs.tencentyun.com"]

重新加载 systemd 配置并重启服务：编辑完毕后，保存文件并退出编辑器。然后，重新加载 systemd 守护进程的配置，并重启 containerd 服务以应用更改：
```
sudo systemctl daemon-reload
sudo systemctl restart containerd
```

上面这种方法在 containerd 2 会放弃，建议参照官方文档 Registry Configuration - Introduction 使用另一种方法。

# tree /etc/containerd/certs.d
/etc/containerd/certs.d
├── _default
│   └── hosts.toml
└── docker.io
    └── hosts.toml

# cat /etc/containerd/certs.d/_default/hosts.toml
[host."https://mirror.ccs.tencentyun.com"]
  capabilities = ["pull", "resolve"]

plugin type="multus" name="multus-cni" failed (add): failed to find plugin "multus" in path [/opt/cni/bin]

创建 deployment 后，发现 Pod 启动失败(FailedCreatePodSandBox)，kubelet 反馈为 Pod sandbox 创建网络失败

# kubectl  describe pod nginx-7854ff8877-sj5hr
Name:             nginx-7854ff8877-sj5hr
...
Events:
  Type     Reason                  Age                 From               Message
  ----     ------                  ----                ----               -------
  Normal   Scheduled               3m6s                default-scheduler  Successfully assigned default/nginx-7854ff8877-sj5hr to hadoop-30.com
  Warning  FailedCreatePodSandBox  3m5s                kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "426c04194d718a2c3931e0130e3f153da56621ccefcd38deaec08d04acae37e0": plugin type="multus" name="multus-cni" failed (add): failed to find plugin "multus" in path [/opt/cni/bin]
  Normal   SandboxChanged          5s (x15 over 3m5s)  kubelet            Pod sandbox changed, it will be killed and re-created.

咱们在前面安装 CNI插件时选择安装 flannel，但在 /etc/cni/net.d 目录中包含 multus.conf(在 Pod 网络插件中可以找到，和 flannel 都属于 CNI 插件)，而且编号 00- 靠前，创建 Pod 时默认使用了 multus，但 multus 配套不完整（在 /opt/cni/bin 目录下找不到 multus 文件）。

解决办法：既然选择了 flannel 作为 CNI 插件，那么就去掉这台节点上之前存在的老配置。

[root@clouderamanager-15 net.d]# mv 00-multus.conf 00-multus.conf.bak

[root@clouderamanager-15 net.d]# ll
总用量 12
-rw-r--r-- 1 root root  290 2月  21 2020 00-multus.conf.bak
-rw-r--r-- 1 root root  292 3月  15 17:25 10-flannel.conflist
drwxr-xr-x 2 root root 4096 8月   4 2021 multus

重新创建 deployment， pod 运行正常。

rpc error: code = Unimplemented desc = unknown service runtime.v1.RuntimeService"

[preflight] Some fatal errors occurred:
        [ERROR CRI]: container runtime is not running: output: time="2024-03-15T22:17:45+08:00" level=fatal msg="validate service connection: validate CRI v1 runtime API for endpoint \"unix:///var/run/containerd/containerd.sock\": rpc error: code = Unimplemented desc = unknown service runtime.v1.RuntimeService"
, error: exit status 1

containerd

说明：如果你从软件包（例如，RPM 或者 .deb）中安装 containerd，你可能会发现其中默认禁止了 CRI 集成插件。

你需要启用 CRI 支持才能在 Kubernetes 集群中使用 containerd。要确保 cri 没有出现在 /etc/containerd/config.toml 文件中 disabled_plugins 列表内。如果你更改了这个文件，也请记得要重启 containerd。

如果你在初次安装集群后或安装 CNI 后遇到容器崩溃循环，则随软件包提供的 containerd 配置可能包含不兼容的配置参数。考虑按照 getting-started.md 中指定的 containerd config default > /etc/containerd/config.toml 重置 containerd 配置，然后相应地设置上述配置参数。

说的很准，这个配置文件竟然是之前一个 rpm 包生成的

# rpm -qf /etc/containerd/config.toml
containerd.io-1.6.22-3.1.el7.x86_64

注释掉

# vim /etc/containerd/config.toml
#disabled_plugins = ["cri"]

重启 containerd，kubeadm join 正常。

systemctl restart containerd

couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s"

[root@hadoop-30 bin]# kubectl get nodes
E0315 22:20:33.130931 3367971 memcache.go:265] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp [::1]:8080: connect: connection refused

按照前面控制面安装的介绍，修改 kubelet 连接集群的鉴权文件。

kube-flannel 拉取镜像失败

使用代理拉取镜像，问题解决。当然还有一个办法，就是查看文末 为 containerd 设置镜像加速。

 https_proxy=10.0.0.15:27070 ctr -n k8s.io images pull  docker.io/flannel/flannel:v0.24.3
docker.io/flannel/flannel:v0.24.3:                                                resolved       |++++++++++++++++++++++++++++++++++++++|
index-sha256:452061a392663283672e905be10762e142d7ad6126ddee7b772e14405ee79a6a:    done           |++++++++++++++++++++++++++++++++++++++|
manifest-sha256:f817a54d1ddca2f01936dc008234a5adef30e46c6052f4b85209b2607fce2e73: done           |++++++++++++++++++++++++++++++++++++++|
layer-sha256:3e829fdb2b63feeef4c0dc83dadcb218566362f0957a85733f4e4e8c0f113b70:    done           |++++++++++++++++++++++++++++++++++++++|
config-sha256:f6f0ee58f49709c24555568e8fa03fca9e601c8d082c714975d2f4b759e2c920:   done           |++++++++++++++++++++++++++++++++++++++|
layer-sha256:dcccee43ad5d95c556da9df1c1d859fd9864643786d8c2c323ca9886c51b07b9:    done           |++++++++++++++++++++++++++++++++++++++|
layer-sha256:1f63b8a402ef975156bb8427ac82e8634ecd0a8412da7f77da81d0a640289d8f:    done           |++++++++++++++++++++++++++++++++++++++|
layer-sha256:cd5a26895faf7ebeb3d4a220bb80f8de21cfc0956b05e0bd4991a435a61bafb6:    done           |++++++++++++++++++++++++++++++++++++++|
layer-sha256:8b804df88a8c5cbdff8f091db5c7abd6f651fe37496f5d5722756b15104e0412:    done           |++++++++++++++++++++++++++++++++++++++|
layer-sha256:419def488131a8e59050625e59f8a16f620a2535bfa81a4b1d022e7fd1113f61:    done           |++++++++++++++++++++++++++++++++++++++|
layer-sha256:c3f73bd6bbcc60f8ffdf012665003911986cdcf8c24628c0a9a73b00471c4597:    done           |++++++++++++++++++++++++++++++++++++++|
layer-sha256:422e014044d57e604b9a61d79ab55787176c4833c4ded54a907010071904bf50:    done           |++++++++++++++++++++++++++++++++++++++|
layer-sha256:17d158cc0f8f79d47d052fe523b3ea1df24a37ec4f200b5ad3c588b4c99739ea:    done           |++++++++++++++++++++++++++++++++++++++|
layer-sha256:4f4fb700ef54461cfa02571ae0db9a0dc1e0cdb5577484a6d75e68dc38e8acc1:    done           |++++++++++++++++++++++++++++++++++++++|
elapsed: 1.8 s                                                                    total:  29.0 M (16.1 MiB/s)
unpacking linux/amd64 sha256:452061a392663283672e905be10762e142d7ad6126ddee7b772e14405ee79a6a...
done: 887.579732ms

所有Pod 运行正常

[root@hadoop-30 bin]# kubectl  get pod -A
NAMESPACE      NAME                                             READY   STATUS    RESTARTS   AGE
kube-flannel   kube-flannel-ds-mp2hz                            1/1     Running   0          8m37s
kube-flannel   kube-flannel-ds-xmlpp                            1/1     Running   0          5h29m
kube-system    coredns-857d9ff4c9-rhljv                         1/1     Running   0          8h
kube-system    coredns-857d9ff4c9-wdhmh                         1/1     Running   0          8h
kube-system    etcd-clouderamanager-15.com                      1/1     Running   1          8h
kube-system    kube-apiserver-clouderamanager-15.com            1/1     Running   1          8h
kube-system    kube-controller-manager-clouderamanager-15.com   1/1     Running   1          8h
kube-system    kube-proxy-chw7d                                 1/1     Running   0          8h
kube-system    kube-proxy-j2dmm                                 1/1     Running   0          35m
kube-system    kube-scheduler-clouderamanager-15.com            1/1     Running   1          8h

K8S节点 NotReady ，提示 cni plugin not initialized

当你的 Kubernetes 节点状态显示为 NotReady，并且错误信息指出是因为“container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized”，这表明 Kubernetes 集群中的 CNI（Container Network Interface）插件未正确初始化或配置。

原来新节点上 CNI kube-flannel 安装失败

kubectl  get pod -A -o wide
kube-flannel                   kube-flannel-ds-bs45h                                  0/1     Init:0/2            0          12m     10.0.0.2       hadoop-2.com             <none>           <none>

原来是国内加载不了pause 镜像

# kubectl  describe pod kube-flannel-ds-bs45h -n kube-flannel
Events:
  Type     Reason                  Age                    From               Message
  ----     ------                  ----                   ----               -------
  Normal   Scheduled               14m                    default-scheduler  Successfully assigned kube-flannel/kube-flannel-ds-bs45h to hadoop-2.com
  Warning  FailedCreatePodSandBox  13m                    kubelet            Failed to create pod sandbox: rpc error: code = DeadlineExceeded desc = failed to get sandbox image "registry.k8s.io/pause:3.8": failed to pull image "registry.k8s.io/pause:3.8": failed to pull and unpack image "registry.k8s.io/pause:3.8": failed to resolve reference "registry.k8s.io/pause:3.8": failed to do request: Head "https://us-west2-docker.pkg.dev/v2/k8s-artifacts-prod/images/pause/manifests/3.8": dial tcp 142.251.8.82:443: i/o timeout

按照本文的方法，加载之，问题解决。

ctr -n k8s.io images import pause.tar

# 集群架构及安装和配置

本文为原创文章，版权归乐扣AI (lesscode.com) 所有，转载需注明出处！

安装 KubeSphere v3.4.0，优秀的K8S容器管理平台

Flower

21 1

将 NFS 作为 K8S 默认 StorageClass

Flower

46 1

暂无评论

暂无评论...

国内 CentOS 7 安装 K8S v1.29.2(CRI:containerd)

前置项

端口占用情况

安装环境

架构

1. 安装 kubeadm

1.1 安装容器运行时 containerd

前置条件：转发 IPv4 并让 iptables 看到桥接流量

1.1 安装 containerd

小知识：containerd 与 containerd-shim-runc-v2 的关系

containerd

containerd-shim-runc-v2

1.2 安装 runc

1.3 安装 CNI plugins

1.2 安装 kubeadm、kubelet 和 kubectl

1.3 配置 cgroup 驱动程序

2. 使用 kubeadm 创建集群

2.1 初始化控制平面节点

kubeadm init 的背后发生了什么？

开始安装

重置环境，重新安装

2.2 安装网络插件 CNI

3. 新增K8S节点

前置项

4. 验证

创建 deployment

创建 Service

FAQ

/proc/sys/net/bridge/bridge-nf-call-iptables does not exist

containerd 的 ctr 命令常见用法

查看运行的容器

查看镜像列表

查看容器节点运行时

查看节点上保存的镜像

设置 containerd 的镜像加速地址

plugin type="multus" name="multus-cni" failed (add): failed to find plugin "multus" in path [/opt/cni/bin]

rpc error: code = Unimplemented desc = unknown service runtime.v1.RuntimeService"

couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s"

kube-flannel 拉取镜像失败

K8S节点 NotReady ，提示 cni plugin not initialized

没有更多了...

将 NFS 作为 K8S 默认 StorageClass

相关文章

暂无评论