19 个 K8S 日常故障处理集锦!(k8srancher)
off999 2025-04-01 21:15 60 浏览 0 评论
问题1:K8S集群服务访问失败?
curl: (60) Peer's Certificate issuer is not recognized.
More details here: http://curl.haxx.se/docs/sslcerts.html
curl performs SSL certificate verification by default, using a "bundle"
of Certificate Authority (CA) public keys (CA certs). If the default
bundle file isn't adequate, you can specify an alternate file
using the --cacert option.
If this HTTPS server uses a certificate signed by a CA represented in
the bundle, the certificate verification probably failed due to a
problem with the certificate (it might be expired, or the name might
not match the domain name in the URL).
If you'd like to turn off curl's verification of the certificate, use
the -k (or --insecure) option.
原因分析:证书不能被识别,其原因为:自定义证书,过期等。
解决方法:更新证书即可。
问题2:K8S集群服务访问失败?
curl: (7) Failed connect to 10.103.22.158:3000; Connection refused
原因分析:端口映射错误,服务正常工作,但不能提供服务。
解决方法:删除svc,重新映射端口即可。
kubectl delete svc nginx-deployment
问题3:K8S集群服务暴露失败?
Error from server (AlreadyExists): services "nginx-deployment" already exists
原因分析:该容器已暴露服务了。
解决方法:删除svc,重新映射端口即可。
问题4:外网无法访问K8S集群提供的服务?
原因分析:K8S集群的type为ClusterIP,未将服务暴露至外网。
解决方法:修改K8S集群的type为NodePort即可,于是可通过所有K8S集群节点访问服务。
kubectl edit svc nginx-deployment
问题5:pod状态为ErrImagePull?
readiness-httpget-pod 0/1 ErrImagePull 0 10s
原因分析:image无法拉取;
Warning Failed 59m (x4 over 61m) kubelet, k8s-node01 Error: ErrImagePull
解决方法:更换镜像即可。
问题6:创建init C容器后,其状态不正常?
NAME READY STATUS RESTARTS AGE
myapp-pod 0/1 Init:0/2 0 20s
原因分析:查看日志发现,pod一直出于初始化中;然后查看pod详细信息,定位pod创建失败的原因为:初始化容器未执行完毕。
Error from server (BadRequest): container "myapp-container" in pod "myapp-pod" is waiting to start: PodInitializing
waiting for myservice
Server: 10.96.0.10
Address: 10.96.0.10:53
** server can't find myservice.default.svc.cluster.local: NXDOMAIN
*** Can't find myservice.svc.cluster.local: No answer
*** Can't find myservice.cluster.local: No answer
*** Can't find myservice.default.svc.cluster.local: No answer
*** Can't find myservice.svc.cluster.local: No answer
*** Can't find myservice.cluster.local: No answer
解决方法:创建相关service,将SVC的name写入K8S集群的coreDNS服务器中,于是coreDNS就能对POD的initC容器执行过程中的域名解析了。
kubectl apply -f myservice.yaml
NAME READY STATUS RESTARTS AGE
myapp-pod 0/1 Init:1/2 0 27m
myapp-pod 0/1 PodInitializing 0 28m
myapp-pod 1/1 Running 0 28m
问题7:探测存活pod状态为CrashLoopBackOff?
readiness-httpget-pod 0/1 CrashLoopBackOff 1 13s
readiness-httpget-pod 0/1 Completed 2 20s
readiness-httpget-pod 0/1 CrashLoopBackOff 2 31s
readiness-httpget-pod 0/1 Completed 3 42s
readiness-httpget-pod 0/1 CrashLoopBackOff 3 53s
原因分析:镜像问题,导致容器重启失败。
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Pulling 56m kubelet, k8s-node01 Pulling image "hub.atguigu.com/library/mylandmarktech/myapp:v1"
Normal Pulled 56m kubelet, k8s-node01 Successfully pulled image "hub.atguigu.com/library/mylandmarktech/myapp:v1"
Normal Created 56m (x3 over 56m) kubelet, k8s-node01 Created container readiness-httpget-container
Normal Started 56m (x3 over 56m) kubelet, k8s-node01 Started container readiness-httpget-container
Normal Pulled 56m (x2 over 56m) kubelet, k8s-node01 Container image "hub.atguigu.com/library/mylandmarktech/myapp:v1" already present on machine
Warning Unhealthy 56m kubelet, k8s-node01 Readiness probe failed: Get http://10.244.2.22:80/index1.html: dial tcp 10.244.2.22:80: connect: connection refused
Warning BackOff 56m (x4 over 56m) kubelet, k8s-node01 Back-off restarting failed container
Normal Scheduled 50s default-scheduler Successfully assigned default/readiness-httpget-pod to k8s-node01解决方法:更换镜像即可。
问题8:POD创建失败?
readiness-httpget-pod 0/1 Pending 0 0s
readiness-httpget-pod 0/1 Pending 0 0s
readiness-httpget-pod 0/1 ContainerCreating 0 0s
readiness-httpget-pod 0/1 Error 0 2s
readiness-httpget-pod 0/1 Error 1 3s
readiness-httpget-pod 0/1 CrashLoopBackOff 1 4s
readiness-httpget-pod 0/1 Error 2 15s
readiness-httpget-pod 0/1 CrashLoopBackOff 2 26s
readiness-httpget-pod 0/1 Error 3 37s
readiness-httpget-pod 0/1 CrashLoopBackOff 3 52s
readiness-httpget-pod 0/1 Error 4 82s原因分析:镜像问题导致容器无法启动。
[root@k8s-master01 ~]# kubectl logs readiness-httpget-pod
url.js:106
throw new errors.TypeError('ERR_INVALID_ARG_TYPE', 'url', 'string', url);
^
TypeError [ERR_INVALID_ARG_TYPE]: The "url" argument must be of type string. Received type undefined
at Url.parse (url.js:106:11)
at Object.urlParse [as parse] (url.js:100:13)
at module.exports (/myapp/node_modules/mongodb/lib/url_parser.js:17:23)
at connect (/myapp/node_modules/mongodb/lib/mongo_client.js:159:16)
at Function.MongoClient.connect (/myapp/node_modules/mongodb/lib/mongo_client.js:110:3)
at Object. (/myapp/app.js:12:13)
at Module._compile (module.js:641:30)
at Object.Module._extensions..js (module.js:652:10)
at Module.load (module.js:560:32)
at tryModuleLoad (module.js:503:12)
at Function.Module._load (module.js:495:3)
at Function.Module.runMain (module.js:682:10)
at startup (bootstrap_node.js:191:16)
at bootstrap_node.js:613:3 Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Pulled 58m (x5 over 59m) kubelet, k8s-node01 Container image "hub.atguigu.com/library/myapp:v1" already present on machine
Normal Created 58m (x5 over 59m) kubelet, k8s-node01 Created container readiness-httpget-container
Normal Started 58m (x5 over 59m) kubelet, k8s-node01 Started container readiness-httpget-container
Warning BackOff 57m (x10 over 59m) kubelet, k8s-node01 Back-off restarting failed container
Normal Scheduled 3m35s default-scheduler Successfully assigned default/readiness-httpget-pod to k8s-node01解决方法:更换镜像。
问题9:POD的ready状态未进入?
readiness-httpget-pod 0/1 Running 0 116s
原因分析:POD的执行命令失败,无法获取资源。
Error from server (NotFound): pods "pod" not found
2021/06/11 07:10:14 [error] 30#30: *1 open() "/usr/share/nginx/html/index1.html" failed (2: No such file or directory), client: 10.244.2.1, server: localhost, request: "GET /index1.html HTTP/1.1", host: "10.244.2.25:80"
10.244.2.1 - - [11/Jun/2021:07:10:14 +0000] "GET /index1.html HTTP/1.1" 404 153 "-" "kube-probe/1.15" "-"
10.244.2.1 - - [11/Jun/2021:07:10:17 +0000] "GET /index1.html HTTP/1.1" 404 153 "-" "kube-probe/1.15" "-"Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Pulled 64m kubelet, k8s-node01 Container image "hub.atguigu.com/library/nginx" already present on machine
Normal Created 64m kubelet, k8s-node01 Created container readiness-httpget-container
Normal Started 64m kubelet, k8s-node01 Started container readiness-httpget-container
Warning Unhealthy 59m (x101 over 64m) kubelet, k8s-node01 Readiness probe failed: HTTP probe failed with statuscode: 404
Normal Scheduled 8m16s default-scheduler Successfully assigned default/readiness-httpget-pod to k8s-node01解决方法:进入容器内部,创建yaml定义的资源
问题10:pod创建失败?
error: error validating "myregistry-secret.yml": error validating data: ValidationError(Pod.spec.imagePullSecrets[0]): invalid type for io.k8s.api.core.v1.LocalObjectReference: got "string", expected "map"; if you choose to ignore these errors, turn validation off with --validate=false
原因分析:yml文件内容出错---使用中文字符;
解决方法:修改myregistrykey内容即可。
11、kube-flannel-ds-amd64-ndsf7插件pod的status为Init:0/1?
排查思路:kubectl -n kube-system describe pod kube-flannel-ds-amd64-ndsf7 #查询pod描述信息;
原因分析:k8s-slave1节点拉取镜像失败。
解决方法:登录k8s-slave1,重启docker服务,手动拉取镜像。
k8s-master节点,重新安装插件即可。
kubectl create -f kube-flannel.yml;kubectl get nodes
12、K8S创建服务status为ErrImagePull?
排查思路:kubectl describe pod test-nginx
原因分析:拉取镜像名称问题。
解决方法:删除错误pod;重新拉取镜像;
kubectl delete pod test-nginx;kubectl run test-nginx --image=10.0.0.81:5000/nginx:alpine
13、不能进入指定容器内部?
Error from server (BadRequest): container volume-test-container is not valid for pod volume-test-pod
原因分析:yml文件comtainers字段重复,导致该pod没有该容器。
解决方法:去掉yml文件中多余的containers字段,重新生成pod。
14、创建PV失败?
persistentvolume/nfspv1 unchanged
persistentvolume/nfspv01 created
Error from server (Invalid): error when applying patch:
{"metadata":{"annotations":{"kubectl.kubernetes.io/last-applied-configuration":"{\"apiVersion\":\"v1\",\"kind\":\"PersistentVolume\",\"metadata\":{\"annotations\":{},\"name\":\"nfspv01\"},\"spec\":{\"accessModes\":[\"ReadWriteOnce\"],\"capacity\":{\"storage\":\"5Gi\"},\"nfs\":{\"path\":\"/nfs2\",\"server\":\"192.168.66.100\"},\"persistentVolumeReclaimPolicy\":\"Retain\",\"storageClassName\":\"nfs\"}}\n"}},"spec":{"nfs":{"path":"/nfs2"}}}
to:
Resource: "/v1, Resource=persistentvolumes", GroupVersionKind: "/v1, Kind=PersistentVolume"
Name: "nfspv01", Namespace: ""
Object: &{map["apiVersion":"v1" "kind":"PersistentVolume" "metadata":map["annotations":map["kubectl.kubernetes.io/last-applied-configuration":"{\"apiVersion\":\"v1\",\"kind\":\"PersistentVolume\",\"metadata\":{\"annotations\":{},\"name\":\"nfspv01\"},\"spec\":{\"accessModes\":[\"ReadWriteOnce\"],\"capacity\":{\"storage\":\"5Gi\"},\"nfs\":{\"path\":\"/nfs1\",\"server\":\"192.168.66.100\"},\"persistentVolumeReclaimPolicy\":\"Retain\",\"storageClassName\":\"nfs\"}}\n"] "creationTimestamp":"2021-06-25T01:54:24Z" "finalizers":["kubernetes.io/pv-protection"] "name":"nfspv01" "resourceVersion":"325674" "selfLink":"/api/v1/persistentvolumes/nfspv01" "uid":"89cb1d15-8012-47f0-aee6-6507bb624387"] "spec":map["accessModes":["ReadWriteOnce"] "capacity":map["storage":"5Gi"] "nfs":map["path":"/nfs1" "server":"192.168.66.100"] "persistentVolumeReclaimPolicy":"Retain" "storageClassName":"nfs" "volumeMode":"Filesystem"] "status":map["phase":"Available"]]}
for: "PV.yml": PersistentVolume "nfspv01" is invalid: spec.persistentvolumesource: Forbidden: is immutable after creation原因分析:pv的name字段重复。
解决方法:修改pv的name字段即可。
15、pod无法挂载PVC?
原因分析:pod无法挂载PVC。
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 60s default-scheduler pod has unbound immediate PersistentVolumeClaims (repeated 2 times)
accessModes与可使用的PV不一致,导致无法挂载PVC,由于只能挂载大于1G且accessModes为RWO的PV,故只能成功创建1个pod,第2个pod一致pending,按序创建时则第3个pod一直未被创建;
解决方法:修改yml文件中accessModes或PV的accessModes即可。
16、问题:pod使用PV后,无法访问其内容?
原因分析:nfs卷中没有文件或权限不对。
解决方法:在nfs卷中创建文件并授予权限。
17、查看节点状态失败?
Error from server (NotFound): the server could not find the requested resource (get services http:heapster:)
原因分析:没有heapster服务。
解决方法:安装promethus监控组件即可。
18、pod一直处于pending'状态?
原因分析:由于已使用同样镜像发布了pod,导致无节点可调度。
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 9s (x13 over 14m) default-scheduler 0/3 nodes are available: 3 node(s) didn't match node selector.
解决方法:删除所有pod后部署pod即可。
19、helm安装组件失败?
[root@k8s-master01 hello-world]# helm install
Error: This command needs 1 argument: chart nam
[root@k8s-master01 hello-world]# helm install ./
Error: no Chart.yaml exists in directory "/root/hello-world"
原因分析:文件名格式不对。解决方法:mv chart.yaml Chart.yaml
20、helm更新release失败?
[root@k8s-master01 hello-world]# helm upgrade joyous-wasp ./
UPGRADE FAILED
ROLLING BACK
Error: render error in "hello-world/templates/deployment.yaml": template: hello-world/templates/deployment.yaml:14:35: executing "hello-world/templates/deployment.yaml" at <.values.image.reposi...>: can't evaluate field image in type interface {}
Error: UPGRADE FAILED: render error in "hello-world/templates/deployment.yaml": template: hello-world/templates/deployment.yaml:14:35: executing "hello-world/templates/deployment.yaml" at <.values.image.reposi...>: can't evaluate field image in type interface {}原因分析:yaml文件语法错误。
解决方法:修改yaml文件即可。
21、etcd启动失败?
[root@k8s-master01 ~]# systemctl enable --now etcd
Created symlink from /etc/systemd/system/etcd3.service to /usr/lib/systemd/system/etcd.service.
Created symlink from /etc/systemd/system/multi-user.target.wants/etcd.service to /usr/lib/systemd/system/etcd.service.
Job for etcd.service failed because a timeout was exceeded. See "systemctl status etcd.service" and "journalctl -xe" for details.原因分析:认证失败原因可能为证书、配置、端口等。检查配置符合etcd版本要求,证书生成过程有效。最后确认端口被占用导致认证失败。
[root@k8s-master01 ~]# systemctl status etcd
● etcd.service - Etcd.service
Loaded: loaded (/usr/lib/systemd/system/etcd.service; enabled; vendor preset: disabled)
Active: activating (start) since Wed 2021-07-14 09:53:03 CST; 1min 6s ago
Docs: https://coreos.com/etcd/docs/latest/
Main PID: 39692 (etcd)
CGroup: /system.slice/etcd.service
└─39692 /usr/local/bin/etcd --config-file=/etc/etcd/etcd.config.yml
Jul 14 09:54:09 k8s-master01 etcd[39692]: rejected connection from "192.168.0.108:46168" (error "remote error: tls: bad certificate", ServerName "")
Jul 14 09:54:09 k8s-master01 etcd[39692]: rejected connection from "192.168.0.108:46166" (error "remote error: tls: bad certificate", ServerName "")
Jul 14 09:54:09 k8s-master01 etcd[39692]: rejected connection from "192.168.0.108:46170" (error "remote error: tls: bad certificate", ServerName "")
Jul 14 09:54:09 k8s-master01 etcd[39692]: rejected connection from "192.168.0.108:46172" (error "remote error: tls: bad certificate", ServerName "")
Jul 14 09:54:09 k8s-master01 etcd[39692]: rejected connection from "192.168.0.108:46176" (error "remote error: tls: bad certificate", ServerName "")
Jul 14 09:54:09 k8s-master01 etcd[39692]: rejected connection from "192.168.0.108:46174" (error "remote error: tls: bad certificate", ServerName "")
Jul 14 09:54:09 k8s-master01 etcd[39692]: rejected connection from "192.168.0.108:46178" (error "remote error: tls: bad certificate", ServerName "")
Jul 14 09:54:09 k8s-master01 etcd[39692]: rejected connection from "192.168.0.108:46180" (error "remote error: tls: bad certificate", ServerName "")
Jul 14 09:54:10 k8s-master01 etcd[39692]: rejected connection from "192.168.0.108:46182" (error "remote error: tls: bad certificate", ServerName "")
Jul 14 09:54:10 k8s-master01 etcd[39692]: rejected connection from "192.168.0.108:46186" (error "remote error: tls: bad certificate", ServerName "") 解决方法:kill占用2379端口的进程,重启etcd即可。
22、svc反代理服务,跨域访问失败?
Connecting to externalname (183.232.231.172:80)
wget: server returned error: HTTP/1.1 403 Forbidden
原因分析:pod跨域访问,被百度禁止访问;
解决方法:修改访问策略即可(略略)。
参考链接:
https://www.cnblogs.com/chalon/p/14415252.html
https://mp.weixin.qq.com/s/2tK-w7MhzxqyoMv9C38tHA
相关推荐
- win10 office破解版(破解版微软office)
-
office在应用商店里激活。具体步骤如下:1、在应用商店里搜索OfficeOffice三件套,选择你需要安装的World、EXCEL、PPT点击免费下载。2、大概100多M,下载完成,点击安装。3、...
- tplink450m无线扩展器怎么重新设置
-
方法一、硬件恢复出厂设置1、tplink的无线扩展器,在机身是有一个复位按钮的,先在你自己的设备上找到复位按钮。重要提示:复位按钮的下方通有英文字母:Reset、RST、RESET等等,请注意查找到。...
- office2010三合一精简版(office2010三合一精简版32位)
-
你好,这个问题有点外行了,office2010作为微软公司的办公套件,一般包括常用的word、excel、powerpoint等模块。另外,不常用的MicrosoftAccess2010(数据库管...
- window7旗舰版忘记登录密码怎么办
-
1、首先重启系统,屏幕亮之后按下键盘上F8键,鼠标选中“带命令提示符的安全模式”,键盘上按下回车(Enter)键;2、之后电脑屏幕出现两个用户名,鼠标点击“administrator”用户名;3、在安...
- 360手机桌面主题下载(360手机主题在哪个文件夹)
-
要在360手机助手上换主题,首先需要打开应用,然后在主界面上找到“主题”选项,点击进入。在主题页面上,可以浏览不同的主题,选择自己喜欢的主题后,点击“下载”按钮进行下载。下载完成后,点击“使用”按钮即...
- 惠普1020win11驱动(惠普1020 驱动)
-
1在浏览器中搜索hp1020打印机。2选择官方网站点击进入。3选择【软件与驱动程序】选项。4选择【打印机】选项。5输入打印机型号【1020】并点击【提交】。6选择打印机驱动并点击打印机型号。7进入下载...
- 卓越电脑定时关机软件(定时关机的电脑软件)
-
要关闭速腾卓越自动启停功能,首先启动车辆,然后找到车辆控制面板上的“车辆”按钮,点击进入车辆设置界面,再选择“驾驶辅助”功能,在其中找到“自动启停”选项,然后点击关闭即可。另外,也可以通过长按“ESP...
- app免费下载大全(百度应用宝免费官方下载)
-
下载无忧(www.xiazai51.com)是一个绿色、安全、免费的下载网站,为给大家提供优质的下载服务,本网站程序由专业团队自主研发,不采用第三方模板,网站程序拥有自主知识产权。网站提供提供电脑软件...
-
- 电脑上有声音显示但没声音(电脑有声音没图像是什么原因)
-
1、主机的显卡出问题了。看下显卡有没有坏,还有看下主机有没有报警的声音。如果你使用的是独立显卡的话,那就拔掉它,插在集成显卡上面,来查看集成显卡有没有信号。解决方法:内存问题首先检查下内存有没有插紧,一般在机箱搬动过后有可能内存变松动造成开...
-
2026-01-08 20:03 off999
- 官方网站app下载安装免费(官方app软件下载)
-
不是太靠谱,很小的电影平台,很多电影,电视资源不全。有的完全看不了,根本不能下载。越用越不好用,不敢充值会员,怕有一天平台会倒掉。根本无法获取视频的地址,在使用体验上,没有爱奇艺,优酷,腾讯视频那么清...
- c盘不能格式化为什么(c盘不能格式化为什么呢)
-
电脑中c盘不能格式化的,可以通过会出出厂设置的办法还原c盘,步骤如下。1,打开windows10系统,在开始中点击“设置”。2,在设置中点击“更新与安全”。3,在窗口中点击“恢复”选项。4,在界面内点...
- 重装机兵4下载手机版(重装机兵4下载手机版中文)
-
重装机兵4下载码是一种数字代码,可以用于在指定平台上下载游戏。玩家可以通过购买或获得下载码来获取游戏,然后在指定平台上输入下载码即可开始下载游戏。重装机兵4是一款机甲题材的游戏,玩家可以在游戏中扮演机...
- 联想光驱怎么重装系统步骤(联想驱动光盘怎么重装系统)
-
根据型号的不同按键也不同,在开机界面都会有提示,通常是F12键,按下之后就可以选择使用光驱启动,再按以下步骤安装即可。;工具:电脑、安装盘。;1、将系统盘放入光驱内,启动电脑后按F12键后显示如下界面...
- 一键qq盗号器免费(一键qq盗号器可信吗)
-
当然有,不仅仅是盗QQ号,还会盗银行账号,游戏账号等等。恩那。根本不用手机,直接电脑都可以改密码了!诅咒那些盗号的人!你设了密保根本没有用!1.盗号网站可以上12321进行举报,反馈问题比较快而且正...
- win7摄像头万能驱动(win7摄像头驱动下载安装)
-
右键点击我的电脑,打开设备管理器,找到图像处理设备,在打开摄像头视频处理设备,右键属性,在设备使用状态栏,选择使用这个设备(启用),点击确定,重启电脑,就ok了。 电脑设置启用摄像头 1、在计算机上...
欢迎 你 发表评论:
- 一周热门
-
-
抖音上好看的小姐姐,Python给你都下载了
-
全网最简单易懂!495页Python漫画教程,高清PDF版免费下载
-
Python 3.14 的 UUIDv6/v7/v8 上新,别再用 uuid4 () 啦!
-
飞牛NAS部署TVGate Docker项目,实现内网一键转发、代理、jx
-
python入门到脱坑 输入与输出—str()函数
-
宝塔面板如何添加免费waf防火墙?(宝塔面板开启https)
-
Python三目运算基础与进阶_python三目运算符判断三个变量
-
(新版)Python 分布式爬虫与 JS 逆向进阶实战吾爱分享
-
失业程序员复习python笔记——条件与循环
-
系统u盘安装(win11系统u盘安装)
-
- 最近发表
- 标签列表
-
- python计时 (73)
- python安装路径 (56)
- python类型转换 (93)
- python进度条 (67)
- python吧 (67)
- python的for循环 (65)
- python格式化字符串 (61)
- python静态方法 (57)
- python列表切片 (59)
- python面向对象编程 (60)
- python 代码加密 (65)
- python串口编程 (77)
- python封装 (57)
- python写入txt (66)
- python读取文件夹下所有文件 (59)
- python操作mysql数据库 (66)
- python获取列表的长度 (64)
- python接口 (63)
- python调用函数 (57)
- python多态 (60)
- python匿名函数 (59)
- python打印九九乘法表 (65)
- python赋值 (62)
- python异常 (69)
- python元祖 (57)
