将 NFS 作为 K8S 默认 StorageClass
今天安装 Kubesphere 时发现缺少 StorageClass,导致无法创建依赖的 PVC,搞起。
CBS作为 StorageClass,性能是好,但成本太高,每一个 PVC 背后会创建一个独立的 CBS硬盘,申明多少就创建了多大的磁盘,很不划算。
用 NFS 比较合适,比如用 腾讯云的 CFS,普通版本的读写性能在 100MB/s ,足够用了。
K8S版本:v1.29.2
1. 创建 NFS
由于 K8S 部署在腾讯云上,于是 NFS 选择了 腾讯云的 CFS
创建文件系统,选择和 CVM 同一个可用区和 VPC。
创建好了后,key看到容量上限是 160TB,吞吐上限100MB/s,其中 IP地址为:10.0.0.68,下文的示例中会使用到。
接下来要安装 NFS provisioner,让 K8S 能读写 CFS,同时作为集群默认的存储类。
2. 安装 NFS provisioner
参照 K8S 工作组提供的 nfs-subdir-external-provisioner 准备安装
前置项
下载仓库,后面会用到里面的配置文件。
$ git clone https://github.com/kubernetes-sigs/nfs-subdir-external-provisioner.git
$ cd nfs-subdir-external-provisioner
2.1 配置鉴权
# Set the subject of the RBAC objects to the current namespace where the provisioner is being deployed
$ NS=$(kubectl config get-contexts|grep -e "^\*" |awk '{print $5}')
$ NAMESPACE=${NS:-default}
$ sed -i'' "s/namespace:.*/namespace: $NAMESPACE/g" ./deploy/rbac.yaml ./deploy/deployment.yaml
$ kubectl create -f deploy/rbac.yaml
deploy/rbac.yaml 内容如下,定义了 NFS provisioner 依赖的权限。
apiVersion: v1
kind: ServiceAccount
metadata:
name: nfs-client-provisioner
# replace with namespace where provisioner is deployed
namespace: default
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: nfs-client-provisioner-runner
rules:
- apiGroups: [""]
resources: ["nodes"]
verbs: ["get", "list", "watch"]
- apiGroups: [""]
resources: ["persistentvolumes"]
verbs: ["get", "list", "watch", "create", "delete"]
- apiGroups: [""]
resources: ["persistentvolumeclaims"]
verbs: ["get", "list", "watch", "update"]
- apiGroups: ["storage.k8s.io"]
resources: ["storageclasses"]
verbs: ["get", "list", "watch"]
- apiGroups: [""]
resources: ["events"]
verbs: ["create", "update", "patch"]
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: run-nfs-client-provisioner
subjects:
- kind: ServiceAccount
name: nfs-client-provisioner
# replace with namespace where provisioner is deployed
namespace: default
roleRef:
kind: ClusterRole
name: nfs-client-provisioner-runner
apiGroup: rbac.authorization.k8s.io
---
kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: leader-locking-nfs-client-provisioner
# replace with namespace where provisioner is deployed
namespace: default
rules:
- apiGroups: [""]
resources: ["endpoints"]
verbs: ["get", "list", "watch", "create", "update", "patch"]
---
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
name: leader-locking-nfs-client-provisioner
# replace with namespace where provisioner is deployed
namespace: default
subjects:
- kind: ServiceAccount
name: nfs-client-provisioner
# replace with namespace where provisioner is deployed
namespace: default
roleRef:
kind: Role
name: leader-locking-nfs-client-provisioner
apiGroup: rbac.authorization.k8s.io
2.2 配置 nfs provisioner
配置 nfs provisioner 的 deployment 配置(deploy/deployment.yaml
),有三处需要修改:
- NFS_SERVER: 10.0.0.68 (10.0.0.68 是我的 NFS 地址)
- NFS_PATH: / (由于创建的 NFS 专门是给 K8S 作为 StorageClass,所以直接选择根目录)
- image: 国内无法访问默认的镜像仓库,可以通过代理 pull ,然后 push 到自己的仓库
kind: Deployment
apiVersion: apps/v1
metadata:
name: nfs-client-provisioner
spec:
replicas: 1
selector:
matchLabels:
app: nfs-client-provisioner
strategy:
type: Recreate
template:
metadata:
labels:
app: nfs-client-provisioner
spec:
serviceAccountName: nfs-client-provisioner
containers:
- name: nfs-client-provisioner
image: registry.k8s.io/sig-storage/nfs-subdir-external-provisioner:v4.0.2
volumeMounts:
- name: nfs-client-root
mountPath: /persistentvolumes
env:
- name: PROVISIONER_NAME
value: k8s-sigs.io/nfs-subdir-external-provisioner
- name: NFS_SERVER
value: 10.0.0.68
- name: NFS_PATH
value: /
volumes:
- name: nfs-client-root
nfs:
server: 10.0.0.68
path: /
创建 deployment
kubectl apply -f deploy/deployment.yaml
检查 deployment 部署是否正常
# kubectl get deployment nfs-client-provisioner
NAME READY UP-TO-DATE AVAILABLE AGE
nfs-client-provisioner 1/1 1 1 63m
2.3 部署 StorageClass
修改 deploy/class.yaml
,增加 is-default-class
的声明,将 nfs-client
设置为默认 strorageclass
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: nfs-client
annotations:
storageclass.kubernetes.io/is-default-class: "true"
provisioner: k8s-sigs.io/nfs-subdir-external-provisioner # or choose another name, must match deployment's env PROVISIONER_NAME'
parameters:
archiveOnDelete: "false"
kubectl apply -f deploy/class.yaml
检查,可以看到 nfs-client (default) 包含 default 标识
# kubectl get sc
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
nfs-client (default) k8s-sigs.io/nfs-subdir-external-provisioner Delete Immediate false 66m
2.4 测试
$ kubectl create -f deploy/test-claim.yaml -f deploy/test-pod.yaml
查看状态,一切正常。
# kubectl get pvc,pod
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS VOLUMEATTRIBUTESCLASS AGE
persistentvolumeclaim/test-claim Bound pvc-245dcc5a-d58b-441a-bc21-0ea61b8b7c9f 1Mi RWX nfs-client <unset> 68m
NAME READY STATUS RESTARTS AGE
pod/nfs-client-provisioner-648798c484-ljrc6 1/1 Running 0 36m
查看 nfs 上的目录情况
# mount 10.0.0.68:/ /mnt
# ll /mnt
总用量 0
drwxrwxrwx 2 root root 20 3月 19 10:56 default-test-claim-pvc-245dcc5a-d58b-441a-bc21-0ea61b8b7c9f
drwxrwxrwx 3 root root 26 3月 19 11:07 kubesphere-monitoring-system-prometheus-k8s-db-prometheus-k8s-0-pvc-8214ae61-6092-4334-9f1f-20210c704550
# ll /mnt/default-test-claim-pvc-245dcc5a-d58b-441a-bc21-0ea61b8b7c9f/
总用量 0
-rw-r--r-- 1 root root 0 3月 19 10:56 SUCCESS
至此,将 nfs 设置为k8s 默认 storageclass 成功。
FAQ
0/2 nodes are available: pod has unbound immediate PersistentVolumeClaims.
测试 Pod 时发现调度失败,原因是 pvc 绑定失败,最后排查是因为 nfs provisioner deployment 因为镜像问题部署失败。
kubectl describe pod test-pod
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 34s default-scheduler 0/2 nodes are available: pod has unbound immediate PersistentVolumeClaims. preemption: 0/2 nodes are available: 2 Preemption is not helpful for scheduling.
mount.nfs: mounting :/ifs/kubernetes failed, reason given by server: No such file or directory
nfs provisioner deployment 部署失败的原因是 NFS_PATH 指定的是 /ifs/kubernetes
,但 nfs 上并没有这个文件,改成 /
后正常
# kubectl describe pod nfs-client-provisioner-7f9b667c6b-cr7cj
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 7m14s default-scheduler Successfully assigned default/nfs-client-provisioner-7f9b667c6b-cr7cj to hadoop-30.com
Warning FailedMount 26s (x11 over 7m10s) kubelet MountVolume.SetUp failed for volume "nfs-client-root" : mount failed: exit status 32
Mounting command: mount
Mounting arguments: -t nfs 10.0.0.68:/ifs/kubernetes /var/lib/kubelet/pods/b7d26239-23c4-4698-9d55-749f5538bc85/volumes/kubernetes.io~nfs/nfs-client-root
Output: mount.nfs: mounting 10.0.0.68:/ifs/kubernetes failed, reason given by server: No such file or directory
failed to pull and unpack image "registry.k8s.io/sig-storage/nfs-subdir-external-provisioner:v4.0.2"
国内无法访问镜像仓库,需要自己改一下 tag 或推到自己的镜像仓库
# kubectl describe pod nfs-client-provisioner-7f9b667c6b-cr7cj
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 67s default-scheduler Successfully assigned default/nfs-client-provisioner-774bb87d75-llc69 to hadoop-30.com
Warning Failed 35s kubelet Failed to pull image "registry.k8s.io/sig-storage/nfs-subdir-external-provisioner:v4.0.2": rpc error: code = DeadlineExceeded desc = failed to pull and unpack image "registry.k8s.io/sig-storage/nfs-subdir-external-provisioner:v4.0.2": failed to resolve reference "registry.k8s.io/sig-storage/nfs-subdir-external-provisioner:v4.0.2": failed to do request: Head "https://us-west2-docker.pkg.dev/v2/k8s-artifacts-prod/images/sig-storage/nfs-subdir-external-provisioner/manifests/v4.0.2": dial tcp 64.233.188.82:443: i/o timeout
Warning Failed 35s kubelet Error: ErrImagePull
Normal BackOff 35s kubelet Back-off pulling image "registry.k8s.io/sig-storage/nfs-subdir-external-provisioner:v4.0.2"
Warning Failed 35s kubelet Error: ImagePullBackOff
Normal Pulling 22s (x2 over 67s) kubelet Pulling image "registry.k8s.io/sig-storage/nfs-subdir-external-provisioner:v4.0.2"