k8s debug

5a2617f6 · kingreatwill · ff0a57e6 · 5a2617f6 · 5a2617f6 · 5a2617f6
11 changed file
--- a/architecture/README.md
+++ b/architecture/README.md
+https://www.oodesign.com/behavioral-patterns/

+https://www.oodesign.com/

 ## 软件设计模式(Software design patterns)


--- a/articles/plugins.md
+++ b/articles/plugins.md
 ## chrome

 ## VS Code
-正则表达式 any rule
\ No newline at end of file
+### 正则表达式 any rule
+
+
+### Remote Development
+
+支持开发人员使用VS Code 在WSL 环境中进行开发（比如获取系统os 会获取到linux而不是win）
+https://marketplace.visualstudio.com/items?itemName=ms-vscode-remote.vscode-remote-extensionpack
+
+Remote - WSL
+https://code.visualstudio.com/docs/remote/wsl
+
+Press F1, select Remote-WSL: New Window for the default distro or Remote-WSL: New Window using Distro for a specific distro.
+
+or
+code --remote wsl+Ubuntu /home/jim/projects/c
+
+Ubuntu 是wsl的系统
+
+wsl -l 查看有哪些wsl系统
--- a/computer/RAID.md
+++ b/computer/RAID.md
+[详解磁盘阵列RAID原理、种类及性能优缺点](https://www.toutiao.com/a6818160002360934924)
\ No newline at end of file
--- a/golang/bugs.md
+++ b/golang/bugs.md
+https://github.com/golang/go/issues/38617
+
+runtime: make the scavenger's pacing logic more defensive
+
+This change adds two bits of logic to the scavenger's pacing. Firstly,
+it checks to make sure we scavenged at least one physical page, if we
+released a non-zero amount of memory. If we try to release less than one
+physical page, most systems will release the whole page, which could
+lead to memory corruption down the road, and this is a signal we're in
+this situation.
+
+Secondly, the scavenger's pacing logic now checks to see if the time a
+scavenging operation takes is measured to be exactly zero or negative.
+The exact zero case can happen if time update granularity is too large
+to effectively capture the time the scavenging operation took, like on
+Windows where we set the OS timer interrupt frequency to 1 ms. The
+negative case should not happen, but we're being defensive (against
+kernel bugs, bugs in the runtime, etc.). If either of these cases
+happen, we fall back to Go 1.13 behavior: assume the scavenge
+operation took around 10µs per physical page. We ignore huge pages in
+this case because we're in unknown territory, so we choose to be
+conservative about pacing (huge pages could only increase the rate of
+scavenging).
+
+Currently, the scavenger is broken on Windows because the granularity of
+time measurement is around 1 ms, which is too coarse to measure how fast
+we're scavenging, so we often end up with a scavenging time of zero,
+followed by NaNs and garbage values in the pacing logic, which usually
+leads to sleeping forever.
+
+Fixes [#38617](https://github.com/golang/go/issues/38617).
\ No newline at end of file
--- a/golang/go-mod.md
+++ b/golang/go-mod.md
@@ -119,6 +119,7 @@ set GONOSUMDB=
 set GOPRIVATE=
 set GOPROXY=https://goproxy.cn,direct
 set GOSUMDB=off
+# export GOPROXY=https://gocenter.io
 ```

 设置GOPROXY代理：

--- a/kubernetes/cdk8s.md
+++ b/kubernetes/cdk8s.md
+# CDK8S 
+使用程序编写yaml文件
+https://github.com/awslabs/cdk8s
+
+https://cdk8s.io/
+
+https://cdk8s.io/getting-started/python
\ No newline at end of file
--- a/kubernetes/img/kube-debug.gif
+++ b/kubernetes/img/kube-debug.gif
--- a/kubernetes/img/kubectl-debug-arch-2.jpg
+++ b/kubernetes/img/kubectl-debug-arch-2.jpg
--- a/kubernetes/k8s-service.md
+++ b/kubernetes/k8s-service.md
@@ -108,3 +108,14 @@ spec:

 https://kuboard.cn/learning/k8s-intermediate/service/np-example.html#%E5%89%8D%E6%8F%90%E6%9D%A1%E4%BB%B6

+
+## k8s中ResourceQuota与LimitRange的作用
+
+ResourceQuota
+ResourceQuota 用来限制 namespace 中所有的 Pod 占用的总的资源 request 和 limit
+
+LimitRange
+LimitRange 用来限制 namespace 中 单个Pod 默认资源 request 和 limit
+https://blog.csdn.net/qq_33235529/article/details/105194130
+
+
--- a/kubernetes/kubectl-debug.md
+++ b/kubernetes/kubectl-debug.md
 # 超好用的K8s诊断工具：kubectl-debug
 https://github.com/aylei/kubectl-debug/releases

+[简化 Pod 故障诊断: kubectl-debug 介绍](https://aleiwu.com/post/kubectl-debug-intro/)
+
+## 背景
+容器技术的一个最佳实践是构建尽可能精简的容器镜像。但这一实践却会给排查问题带来麻烦：精简后的容器中普遍缺失常用的排障工具，部分容器里甚至没有 shell (比如 FROM scratch ）。 在这种状况下，我们只能通过日志或者到宿主机上通过 docker-cli 或 nsenter 来排查问题，效率很低。Kubernetes 社区也早就意识到了这个问题，在 16 年就有相关的 [Issue Support for troubleshooting distroless containers](https://github.com/kubernetes/kubernetes/issues/27140) 并形成了对应的 [Proposal](https://github.com/kubernetes/community/blob/master/contributors/design-proposals/node/troubleshoot-running-pods.md)。 遗憾的是，由于改动的涉及面很广，相关的实现至今还没有合并到 Kubernetes 上游代码中。而在 一个偶然的机会下（PingCAP 一面要求实现一个 kubectl 插件实现类似的功能），我开发了 [kubectl-debug](https://github.com/aylei/kubectl-debug): 通过启动一个安装了各种排障工具的容器，来帮助诊断目标容器 。
+
+## 原理
+kubectl-debug 本身非常简单，因此只要理解了它的工作原理，你就能完全掌握这个工具，并且还能用它做 debug 之外的事情。
+
+我们知道，容器本质上是带有 cgroup 资源限制和 namespace 隔离的一组进程。因此，我们只要启动一个进程，并且让这个进程加入到目标容器的各种 namespace 中，这个进程就能 "进入容器内部"（注意引号），与容器中的进程"看到"相同的根文件系统、虚拟网卡、进程空间了——这也正是 docker exec 和 kubectl exec 等命令的运行方式。
+
+现在的状况是，我们不仅要 "进入容器内部"，还希望带一套工具集进去帮忙排查问题。那么，想要高效管理一套工具集，又要可以跨平台，最好的办法就是把工具本身都打包在一个容器镜像当中。 接下来，我们只需要通过这个"工具镜像"启动容器，再指定这个容器加入目标容器的的各种 namespace，自然就实现了 "携带一套工具集进入容器内部"。事实上，使用 docker-cli 就可以实现这个操作：
+```
+export TARGET_ID=666666666
+# 加入目标容器的 network, pid 以及 ipc namespace
+docker run -it --network=container:$TARGET_ID --pid=container:$TARGET_ID --ipc=container:$TARGET_ID busybox
+```
+这就是 kubectl-debug 的出发点： 用工具容器来诊断业务容器 。背后的设计思路和 sidecar 等模式是一致的：每个容器只做一件事情。
+
+具体到实现上，一条 `kubectl debug <target-pod>` 命令背后是这样的：
+
+![](img/kubectl-debug-arch-2.jpg)
+步骤分别是:
+
+1. 插件查询 ApiServer：demo-pod 是否存在，所在节点是什么
+2. ApiServer 返回 demo-pod 所在所在节点
+3. 插件请求在目标节点上创建 Debug Agent Pod
+4. Kubelet 创建 Debug Agent Pod
+5. 插件发现 Debug Agent 已经 Ready，发起 debug 请求（长连接）
+6. Debug Agent 收到 debug 请求，创建 Debug 容器并加入目标容器的各个 Namespace 中，创建完成后，与 Debug 容器的 tty 建立连接
+
+
+接下来，客户端就可以开始通过 5，6 这两个连接开始 debug 操作。操作结束后，Debug Agent 清理 Debug 容器，插件清理 Debug Agent，一次 Debug 完成。效果如下图：
+![](img/kube-debug.gif)
+
+## 安装
 在K8s环境部署应用后，经常遇到需要进入pod进行排错。除了查看pod logs和describe方式之外，传统的解决方式是在业务pod基础镜像中提前安装好procps、net-tools、tcpdump、vim等工具。但这样既不符合最小化镜像原则，又徒增Pod安全漏洞风险。

 kubectl-debug是一个简单、易用、强大的 kubectl 插件, 能够帮助你便捷地进行 Kubernetes 上的 Pod 排障诊断。它通过启动一个排错工具容器，并将其加入到目标业务容器的pid, network, user 以及 ipc namespace 中，这时我们就可以在新容器中直接用 netstat, tcpdump 这些熟悉的工具来解决问题了, 而业务容器可以保持最小化, 不需要预装任何额外的排障工具。
@@ -148,4 +183,109 @@ spec:
    type: RollingUpdate
 ```

+## 典型案例
+kubectl debug 默认使用 [nicolaka/netshoot](https://github.com/nicolaka/netshoot) 作为默认的基础镜像，里面内置了相当多的排障工具，包括：
+
+### 使用 iftop 查看容器网络流量：
+```
+➜  ~ kubectl debug demo-pod
+
+root @ /
+ [2] 🐳  → iftop -i eth0
+interface: eth0
+IP address is: 10.233.111.78
+MAC address is: 86:c3:ae:9d:46:2b
+```
+
+### 使用 drill 诊断 DNS 解析：
+```
+root @ /
+ [3] 🐳  → drill -V 5 demo-service
+;; ->>HEADER<<- opcode: QUERY, rcode: NOERROR, id: 0
+;; flags: rd ; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 0
+;; QUESTION SECTION:
+;; demo-service.	IN	A
+
+;; ANSWER SECTION:
+
+;; AUTHORITY SECTION:
+
+;; ADDITIONAL SECTION:
+
+;; Query time: 0 msec
+;; WHEN: Sat Jun  1 05:05:39 2019
+;; MSG SIZE  rcvd: 0
+;; ->>HEADER<<- opcode: QUERY, rcode: NXDOMAIN, id: 62711
+;; flags: qr rd ra ; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0
+;; QUESTION SECTION:
+;; demo-service.	IN	A
+
+;; ANSWER SECTION:
+
+;; AUTHORITY SECTION:
+.	30	IN	SOA	a.root-servers.net. nstld.verisign-grs.com. 2019053101 1800 900 604800 86400
+
+;; ADDITIONAL SECTION:
+
+;; Query time: 58 msec
+;; SERVER: 10.233.0.10
+;; WHEN: Sat Jun  1 05:05:39 2019
+;; MSG SIZE  rcvd: 121
+```
+
+### 使用 tcpdump 抓包：
+
+```
+root @ /
+ [4] 🐳  → tcpdump -i eth0 -c 1 -Xvv
+tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
+12:41:49.707470 IP (tos 0x0, ttl 64, id 55201, offset 0, flags [DF], proto TCP (6), length 80)
+    demo-pod.default.svc.cluster.local.35054 > 10-233-111-117.demo-service.default.svc.cluster.local.8080: Flags [P.], cksum 0xf4d7 (incorrect -> 0x9307), seq 1374029960:1374029988, ack 1354056341, win 1424, options [nop,nop,TS val 2871874271 ecr 2871873473], length 28
+  0x0000:  4500 0050 d7a1 4000 4006 6e71 0ae9 6f4e  E..P..@.@.nq..oN
+  0x0010:  0ae9 6f75 88ee 094b 51e6 0888 50b5 4295  ..ou...KQ...P.B.
+  0x0020:  8018 0590 f4d7 0000 0101 080a ab2d 52df  .............-R.
+  0x0030:  ab2d 4fc1 0000 1300 0000 0000 0100 0000  .-O.............
+  0x0040:  000e 0a0a 08a1 86b2 ebe2 ced1 f85c 1001  .............\..
+1 packet captured
+11 packets received by filter
+0 packets dropped by kernel
+```
+
+### 访问目标容器的根文件系统：
+
+容器技术(如 Docker）利用了 /proc 文件系统提供的 /proc/{pid}/root/ 目录实现了为隔离后的容器进程提供单独的根文件系统（root filesystem）的能力（就是 chroot 一下）。当我们想要访问 目标容器的根文件系统时，可以直接访问这个目录：
+```
+root @ /
+ [5] 🐳  → tail -f /proc/1/root/log_
+Hello, world!
+```
+
+这里有一个常见的问题是 free top 等依赖 /proc 文件系统的命令会展示宿主机的信息，这也是容器化过程中开发者需要适应的一点（当然了，各种 runtime 也要去适应，比如臭名昭著的 [Java 8u121 以及更早的版本不识别 cgroups 限制](https://blog.softwaremill.com/docker-support-in-new-java-8-finally-fd595df0ca54) 问题就属此列）。
+
+
+### 诊断 CrashLoopBackoff
+排查 CrashLoopBackoff 是一个很麻烦的问题，Pod 可能会不断重启， kubectl exec 和 kubectl debug 都没法稳定进行排查问题，基本上只能寄希望于 Pod 的日志中打印出了有用的信息。 为了让针对 CrashLoopBackoff 的排查更方便， kubectl-debug 参考 oc debug 命令，添加了一个 --fork 参数。当指定 --fork 时，插件会复制当前的 Pod Spec，做一些小修改， 再创建一个新 Pod：
+
+- 新 Pod 的所有 Labels 会被删掉，避免 Service 将流量导到 fork 出的 Pod 上
+- 新 Pod 的 ReadinessProbe 和 LivnessProbe 也会被移除，避免 kubelet 杀死 Pod
+- 新 Pod 中目标容器（待排障的容器）的启动命令会被改写，避免新 Pod 继续 Crash
+
+接下来，我们就可以在新 Pod 中尝试复现旧 Pod 中导致 Crash 的问题。为了保证操作的一致性，可以先 chroot 到目标容器的根文件系统中：
+
+```
+➜  ~ kubectl debug demo-pod --fork
+
+root @ /
+ [4] 🐳  → chroot /proc/1/root
+
+root @ /
+ [#] 🐳  → ls
+ bin            entrypoint.sh  home           lib64          mnt            root           sbin           sys            tmp            var
+ dev            etc            lib            media          proc           run            srv            usr
+
+root @ /
+ [#] 🐳  → ./entrypoint.sh
+ # 观察执行启动脚本时的信息并根据信息进一步排障
+```

+> kubectl get pod --v=8 加上--v=8 可以查看详细
--- a/linux/ssh.md
+++ b/linux/ssh.md
+
+## ssh连接
+一、正常连接方法：ssh root@10.0.0.20
+
+二、无密码连接方法(有两台机器：此处我把被连接的称为服务器，另一台则称为客户端):
+
+　　1、先在服务器添加目录 .ssh： mkdir  .ssh
+
+　　2、分配.ssh目录权限： chmod 777 .ssh
+
+　　3、在客户端创建公钥与私钥： ssh-keygen　　//此处直接按多个回车键，直到创建成功
+
+　　4、将客户端的公钥复制到要服务器，运行命令：ssh-copy-id root@10.0.0.20 ，待输入正确密码后即可实现ssh无密码访问。
+
+三、Win没有ssh-copy-id
+在服务器端
+```
+mkdir ~/.ssh
+chmod 0700 ~/.ssh
+touch ~/.ssh/authorized_keys
+chmod 0644 ~/.ssh/authorized_keys
+nano ~/.ssh/authorized_keys     # Ctrl+O 保存   Ctrl+X 退出
+```
+在win上找到ssh-keygen生成的.pub文件 放到~/.ssh/authorized_keys文件里
+