nfs 作为存储,pvc 和 pv 都是 bound 状态,而且还测试过 pod 都能够向 nfs 里面写入文件,但搭建 mysql 就报错:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 25m default-scheduler 0/4 nodes are available: 4 pod has unbound immediate PersistentVolumeClaims. preemption: 0/4 nodes are available: 4 Preemption is not helpful for scheduling.
Normal Scheduled 25m default-scheduler Successfully assigned lzipant/mysql-0 to arch124
Normal Pulling 25m kubelet Pulling image "mysql:5.7"
Normal Pulled 25m kubelet Successfully pulled image "mysql:5.7" in 3.099960834s
Normal Created 24m (x5 over 25m) kubelet Created container init-mysql
Normal Pulled 24m (x4 over 25m) kubelet Container image "mysql:5.7" already present on machine
Normal Started 24m (x5 over 25m) kubelet Started container init-mysql
Warning BackOff 43s (x117 over 25m) kubelet Back-off restarting failed container
各项 yaml 如下:
configMap.yaml:
apiVersion: v1
kind: ConfigMap
metadata:
name: mysql
namespace: lzipant
labels:
app: mysql
data:
master.cnf: |
[mysqld]
log-bin
slave.cnf: |
[mysqld]
super-read-only
service.yaml:
apiVersion: v1
kind: ConfigMap
metadata:
name: mysql
namespace: lzipant
labels:
app: mysql
data:
master.cnf: |
[mysqld]
log-bin
slave.cnf: |
[mysqld]
super-read-only
statefulSet.yaml:
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: mysql
namespace: lzipant
spec:
selector:
matchLabels:
# 适用于所有 label 包括 app=mysql 的 pod
app: mysql
serviceName: mysql
replicas: 3
# 定义 pod
template:
metadata:
labels:
app: mysql
spec:
# 在 init 容器中为 pod 中的 mysql 容器做初始化工作
initContainers:
# init-mysql 容器会分配 pod 的角色是 master 还是 slave, 然后生成配置文件
- name: init-mysql
image: mysql:5.7
command:
- bash
- "-c"
- |
set -ex
# 生成 server-id
[[ `hostname` =~ -([0-9]+)$ ]] || exit 1
ordinal=${BASH_REMATCH[1]}
echo [mysqld] > /mnt/conf.d/server-id.cnf
# 写入 server-id
echo server-id=$((100 + $ordinal)) >> /mnt/conf.d/server-id.cnf
# server-id 尾号为 0 作为 master, 否则作为 slave
# 这里 cp 到 pod 中的 cnf 会与 server-id.cnf 一块被 mysql.cnf include 进去
# 这里指定了序号为 0 的 pod 会作为 master 节点提供写, 其他 pod 作为 slave 节点提供读
if [[ $ordinal -eq 0 ]]; then
cp /mnt/config-map/master.cnf /mnt/conf.d/
else
cp /mnt/config-map/slave.cnf /mnt/conf.d/
fi
volumeMounts:
# 将 conf 临时卷挂载到了 pod 的 /mnt/conf.d 路径下
- name: conf
mountPath: /mnt/conf.d
# 这里把 ConfigMap 中的配置怪哉到了 pod 的 /mnt/config-map 路径下
- name: config-map
mountPath: /mnt/config-map
# 这一个 init 容器会正在 pod 启动时假定之前已经存在数据, 并将之前的数据复制过来, 以确保新 pod 中有数据可以提供使用
- name: clone-mysql
# xtrabackup 是一个开源工具, 用于克隆 mysql 的数据
image: ist0ne/xtrabackup:latest
command:
- bash
- "-c"
- |
set -ex
# Skip the clone if data already exists.
[[ -d /var/lib/mysql/mysql ]] && exit 0
# Skip the clone on master (ordinal index 0).
[[ `hostname` =~ -([0-9]+)$ ]] || exit 1
ordinal=${BASH_REMATCH[1]}
[[ $ordinal -eq 0 ]] && exit 0
# Clone data from previous peer.
ncat --recv-only mysql-$(($ordinal-1)).mysql 3307 | xbstream -x -C /var/lib/mysql
# Prepare the backup.
xtrabackup --prepare --target-dir=/var/lib/mysql
volumeMounts:
- name: mysql-data
mountPath: /var/lib/mysql
subPath: mysql
- name: conf
mountPath: /etc/mysql/conf.d
containers:
# 实际运行 mysqld 服务的 mysql 容器
- name: mysql
image: mysql:5.7
env:
- name: MYSQL_ROOT_PASSWORD
value: "abcdef"
ports:
- name: mysql
containerPort: 3306
volumeMounts:
# 将 data 卷的 mysql 目录挂在到容器的 /var/lib/mysql
- name: mysql-data
mountPath: /var/lib/mysql
subPath: mysql
- name: conf
mountPath: /etc/mysql/conf.d
resources:
requests:
cpu: 500m
memory: 1Gi
# 启动存活探针, 如果失败会重启 pod
livenessProbe:
exec:
command: ["mysqladmin", "ping"]
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 5
# 启动就绪探针确保容器的运行正常, 如果有失败会将 pod 从 service 关联的 endpoint 中剔除
readinessProbe:
exec:
# Check we can execute queries over TCP (skip-networking is off).
command: ["mysql", "-h", "127.0.0.1", "-e", "SELECT 1"]
initialDelaySeconds: 5
periodSeconds: 2
timeoutSeconds: 1
# init 结束后还会在启动一个 xtrabackup 容器作为 mysqld 容器的 sidecar 运行
- name: xtrabackup
image: ist0ne/xtrabackup:latest
ports:
- name: xtrabackup
containerPort: 3307
command:
- bash
- "-c"
- |
set -ex
cd /var/lib/mysql
# 他会在启动时查看之前是否有数据克隆文件存在, 如果有那就去其他从节点复制数据, 如果没有就去主节点复制数据
# Determine binlog position of cloned data, if any.
if [[ -f xtrabackup_slave_info && "x$(<xtrabackup_slave_info)" != "x" ]]; then
# XtraBackup already generated a partial "CHANGE MASTER TO" query
# because we're cloning from an existing slave. (Need to remove the tailing semicolon!)
cat xtrabackup_slave_info | sed -E 's/;$//g' > change_master_to.sql.in
# Ignore xtrabackup_binlog_info in this case (it's useless).
rm -f xtrabackup_slave_info xtrabackup_binlog_info
elif [[ -f xtrabackup_binlog_info ]]; then
# We're cloning directly from master. Parse binlog position.
[[ `cat xtrabackup_binlog_info` =~ ^(.*?)[[:space:]]+(.*?)$ ]] || exit 1
rm -f xtrabackup_binlog_info xtrabackup_slave_info
echo "CHANGE MASTER TO MASTER_LOG_FILE='${BASH_REMATCH[1]}',\
MASTER_LOG_POS=${BASH_REMATCH[2]}" > change_master_to.sql.in
fi
# Check if we need to complete a clone by starting replication.
if [[ -f change_master_to.sql.in ]]; then
echo "Waiting for mysqld to be ready (accepting connections)"
until mysql -h 127.0.0.1 -e "SELECT 1"; do sleep 1; done
echo "Initializing replication from clone position"
mysql -h 127.0.0.1 \
-e "$(<change_master_to.sql.in), \
MASTER_HOST='mysql-0.mysql', \
MASTER_USER='root', \
MASTER_PASSWORD='', \
MASTER_CONNECT_RETRY=10; \
START SLAVE;" || exit 1
# In case of container restart, attempt this at-most-once.
mv change_master_to.sql.in change_master_to.sql.orig
fi
# Start a server to send backups when requested by peers.
exec ncat --listen --keep-open --send-only --max-conns=1 3307 -c \
"xtrabackup --backup --slave-info --stream=xbstream --host=127.0.0.1 --user=root"
volumeMounts:
# 将 data 卷的 mysql 目录挂在到容器的 /var/lib/mysql
- name: mysql-data
mountPath: /var/lib/mysql
subPath: mysql
- name: conf
mountPath: /etc/mysql/conf.d
volumes:
- name: conf
# pod 在节点上被移除时, emptyDir 会同时被删除
# emptyDir 一般被用作缓存目录, 这里用在 config
emptyDir: {}
- name: config-map
# ConfigMap 对象中存储的数据可以被 configMap 类型的卷引用, 然后被 Pod 中运行的容器使用
# 这里引用了前面定义了名称为 mysql 的 ConfigMap 对象
configMap:
name: mysql
volumeClaimTemplates:
# 这里面定义的是对 PVC 的模板, 这里没有单独为 mysql 创建 pvc, 而是动态创建的
- metadata:
name: mysql-data
namespace: lzipant
spec:
accessModes: ["ReadWriteOnce"]
# 如果没有配置默认的 storageClass 的话, 需要指定 storageClassName
storageClassName: managed-nfs-storage
resources:
requests:
storage: 5Gi
storageClass.yaml:
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: managed-nfs-storage
namespace: lzipant
provisioner: fuseim.pri/ifs # must match deployement env PROVISIONER_NAME
reclaimPolicy: Retain
1
lingly02 2022-08-01 16:49:19 +08:00
这个一般是 PV 没有自动创建成功,可以 `kubectl desc pvc`, `kubectl get pv` 看看
|
2
HarrisonLee OP @lingly02 应该创建成功了的吧
```shell [root@arch121 mysql]# kubectl get pvc -n lzipant NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE mysql-data-mysql-0 Bound pvc-ba6d4db6-4de3-4801-b700-f033c25c89af 5Gi RWX managed-nfs-storage 95s [root@arch121 mysql]# kubectl get pv -n lzipant NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE pvc-ba6d4db6-4de3-4801-b700-f033c25c89af 5Gi RWX Delete Bound lzipant/mysql-data-mysql-0 managed-nfs-storage 100s [root@arch121 mysql]# kubectl describe pvc mysql-data-mysql-0 -n lzipant Name: mysql-data-mysql-0 Namespace: lzipant StorageClass: managed-nfs-storage Status: Bound Volume: pvc-ba6d4db6-4de3-4801-b700-f033c25c89af Labels: app=mysql Annotations: pv.kubernetes.io/bind-completed: yes pv.kubernetes.io/bound-by-controller: yes volume.beta.kubernetes.io/storage-class: managed-nfs-storage volume.beta.kubernetes.io/storage-provisioner: fuseim.pri/ifs volume.kubernetes.io/storage-provisioner: fuseim.pri/ifs Finalizers: [kubernetes.io/pvc-protection] Capacity: 5Gi Access Modes: RWX VolumeMode: Filesystem Used By: mysql-0 Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal ExternalProvisioning 2m2s persistentvolume-controller waiting for a volume to be created, either by external provisioner "fuseim.pri/ifs" or manually created by system administrator Normal Provisioning 2m fuseim.pri/ifs_nfs-client-provisioner-5868c55665-mbmdq_f2a0aea4-8bfc-4b4a-8eac-78ca5cfcf134 External provisioner is provisioning volume for claim "lzipant/mysql-data-mysql-0" Normal ProvisioningSucceeded 2m fuseim.pri/ifs_nfs-client-provisioner-5868c55665-mbmdq_f2a0aea4-8bfc-4b4a-8eac-78ca5cfcf134 Successfully provisioned volume pvc-ba6d4db6-4de3-4801-b700-f033c25c89af [root@arch121 mysql]# kubectl describe pv pvc-ba6d4db6-4de3-4801-b700-f033c25c89af -n lzipant Name: pvc-ba6d4db6-4de3-4801-b700-f033c25c89af Labels: <none> Annotations: pv.kubernetes.io/provisioned-by: fuseim.pri/ifs Finalizers: [kubernetes.io/pv-protection] StorageClass: managed-nfs-storage Status: Bound Claim: lzipant/mysql-data-mysql-0 Reclaim Policy: Delete Access Modes: RWX VolumeMode: Filesystem Capacity: 5Gi Node Affinity: <none> Message: Source: Type: NFS (an NFS mount that lasts the lifetime of a pod) Server: arch121.com Path: /mnt/nfs/lzipant-mysql-data-mysql-0-pvc-ba6d4db6-4de3-4801-b700-f033c25c89af ReadOnly: false Events: <none> ``` |
3
novolunt 2022-08-01 17:30:57 +08:00 1
众所周知生产 mysql 不建议直接用 kubernets ,所以衍生出 vitness 这个项目 https://vitess.io/
建议还是虚拟机比较妥 |
4
novolunt 2022-08-01 17:33:05 +08:00
数据库(mysql/mongo)有个特点,要求块的连续读写,所以使用 k8s 的 pv 或者加密的硬盘,常常会出现无法正常启动的问题
|
5
gengchun 2022-08-01 17:37:56 +08:00
你觉得还是 nfs 有问题?那也应该是把 mysql 的启动报错贴出来。用 nfs 挂盘给 mysql 已经很奇怪了。
你的初始化 bash 脚本从哪里来的,真的生产搭建也没有必要这样初始化主库和从库数据库。 虽然 mariadb 镜像的默认启动脚本也有不少问题,但是,这个脚本更是漏掉了很多启动 mysql 需要考虑的问题,只是写了主从,还有 /mnt/conf.d 是什么?这个初始化脚本能正常启动?你确定不是初始化脚本的问题? 而且 mysql 为什么要加 readinessProbe ?要做主从自动切换,也应该是使用 maxscale 这样的中间件控制的。 |
6
HarrisonLee OP |
7
gengchun 2022-08-01 17:47:54 +08:00
@novolunt pv 和连续读写没有关系。pv/pvc 只是个声明。没有这样的说法。
k8s 做这个不合适,只不过是因为特意这么做会牺牲 k8s 的特性,相比虚机方案复杂,运维上反而增加复杂度,而没有明显的好处,显得有些多余。并不是说 k8s 不能这么做。 |
8
gengchun 2022-08-01 18:00:41 +08:00
@HarrisonLee 我不认为这个脚本可以从零正常启动,你可以自己看一下 mariadb 官方镜像里的初始化脚本是如何实现的。这个和 k8s 没有关系,需要看 docker 镜像的制作。
开发或者测试本地也没有必要搭建主从。而搭建团队共用的 stage 环境,也没有必要专门做这种主从。启动完手工配一次主从就够了。 只是初学的话,把主从去掉,mariadb 官方镜像按 k8s 官网的那个简单的部署来就行了。甚至都没有必要用 nfs 。 |
9
novolunt 2022-08-01 18:15:50 +08:00
@gengchun 嗯不是 pv 的问题,其实是想表达的是用到的 csi 是不可见的,你无法知道它底层硬件信息及技术细节,不适合用来存储 db 相关的数据。
|
10
anubu 2022-08-01 18:29:22 +08:00
查容器日志,大概率是文件系统权限问题。数据库挂在 nfs 或 smb 很容易有文件系统问题,非要这么用的话,就用 PV 挂块存储,比如 iscsi 之类的。
|
11
Pythondr 2022-08-02 10:38:26 +08:00
用 NFS 做数据库的存储后端,数据库的可用性基本为零。读写性能太拉跨了
|