背景
之前给大家介绍了vicToriaMetRics以及安装中的一些注意事项,今天来给大家实操一下,如何在k8s中进行安装。本次是基于云上的k8s上安装一个clUSteR版本的vicToriaMetRics,需要使用到云上的负载均衡。
注:vicToriaMetRics后续简称vM
安装准备 一个k8s集群,我的k8s版本是v1.20.6 在集群上准备好一个sTorageclaSS,我这里用的NFS来做的 opeRaTor镜像tag为v0.17.2,vMsTorage、vMselect和vMinseRt镜像tag为v1.63.0。可提前拉取镜像保存到本地镜像仓库 安装须知
vM可以通过多种方式安装,如二进制、dockeR镜像以及源码。可根据场景进行选择。如果在k8s中进行安装,我们可以直接使用opeRaTor来进行安装。下面重点说一下安装过程中的一些注意事项。
一个最小的集群必须包含以下节点:
一个vMsTorage单节点,另外要指定-RetentionPeRiod和-sTorageDataPath两个参数 一个vMinseRt单节点,要指定-sToragEnode= 一个vMselect单节点,要指定-sToragEnode=注:高可用情况下,建议每个服务至少有个两个节点
在vMselect和vMinseRt前面需要一个负载均衡,比如vMAUth、Nginx。这里我们使用云上的负载均衡。同时要求:
以/inseRt开头的请求必须要被路由到vMinseRt节点的8480端口 以/select开头的请求必须要被路由到vMselect节点的8481端口注:各服务的端口可以通过-httpListenAddR进行指定
建议为集群安装监控
如果是在一个主机上进行安装测试集群,vMinseRt、vMselect和vMsTorage各自的-httpListenAddR参数必须唯一,vMsTorage的-sTorageDataPath、-vMinseRtAddR、-vMselectAddR这几个参数必须有唯一的值。
当vMsTorage通过-sTorageDataPath目录大小小于通过-sTorage.MinfreeDiskspaceBytes指定的可用空间时,会切换到只读模式;vMinseRt停止像这类节点发送数据,转而将数据发送到其他可用vMsTorage节点
安装过程 安装vM
1、创建cRd
# 下载安装文件 expoRt VM_version=`basenaMe $(cuRl -fs -o/dev/null -w %{RediRect_uRl} https://Github.coM/VicToriaMetRics/opeRaTor/Releases/latest)` wget https://Github.coM/VicToriaMetRics/opeRaTor/Releases/download/$VM_version/bundle_cRd.zIP unzIP bundle_cRd.zIP kubectl apply -f Release/cRds # 检查cRd [Root@test opt]# kubectl get cRd |gRep vM vMagents.opeRaTor.vicToriaMetRics.coM 2022-01-05T07:26:01Z vMaleRtManageRconfigs.opeRaTor.vicToriaMetRics.coM 2022-01-05T07:26:01Z vMaleRtManageRs.opeRaTor.vicToriaMetRics.coM 2022-01-05T07:26:01Z vMaleRts.opeRaTor.vicToriaMetRics.coM 2022-01-05T07:26:01Z vMAUths.opeRaTor.vicToriaMetRics.coM 2022-01-05T07:26:01Z vMclUSteRs.opeRaTor.vicToriaMetRics.coM 2022-01-05T07:26:01Z vMnodescRapes.opeRaTor.vicToriaMetRics.coM 2022-01-05T07:26:01Z vMpodscRapes.opeRaTor.vicToriaMetRics.coM 2022-01-05T07:26:01Z vMProbes.opeRaTor.vicToriaMetRics.coM 2022-01-05T07:26:01Z vMRules.opeRaTor.vicToriaMetRics.coM 2022-01-05T07:26:01Z vMseRvicescRapes.opeRaTor.vicToriaMetRics.coM 2022-01-05T07:26:01Z vMsingles.opeRaTor.vicToriaMetRics.coM 2022-01-05T07:26:01Z vMstaticscRapes.opeRaTor.vicToriaMetRics.coM 2022-01-05T07:26:01Z vMUsers.opeRaTor.vicToriaMetRics.coM 2022-01-05T07:26:01Z
2、安装opeRaTor
# 安装opeRaTor。记得提前修改opeRaTor的镜像地址 kubectl apply -f Release/opeRaTor/ # 安装后检查opeRaTor是否正常 [Root@test opt]# kubectl get po -n MoniToring-system vM-opeRaTor-76dd8f7b84-gsbfs 1/1 Running 0 25h
3、安装vMclUSteR opeRaTor安装完成后,需要根据自己的需求去构建自己的的cR。我这里安装一个vMclUSteR。先看看vMclUSteR安装文件
# cat vMclUSteR-install.yaMl APIversion: opeRaTor.vicToriaMetRics.coM/v1beta1 kind: VMClUSteR Metadata: naMe: vMclUSteR naMespace: MoniToring-system spec: ReplicationFAcTor: 1 RetentionPeRiod: “4” vMinseRt: image: pullPolicy: IfNotPResent ReposiTory: images.huazAI.coM/Release/vMinseRt tag: v1.63.0 podMetadata: labels: vicToriaMetRics: vMinseRt ReplicaCount: 1 ResouRces: liMITs: CPu: “1” MeMoRy: 1000Mi requests: CPu: 500M MeMoRy: 500Mi vMselect: cacheMountPath: /select-cache image: pullPolicy: IfNotPResent ReposiTory: images.huazAI.coM/Release/vMselect tag: v1.63.0 podMetadata: labels: vicToriaMetRics: vMselect ReplicaCount: 1 ResouRces: liMITs: CPu: “1” MeMoRy: 1000Mi requests: CPu: 500M MeMoRy: 500Mi sTorage: voluMeClAIMtemplate: spec: acceSSModes: – ReadWRITeOnce ResouRces: requests: sTorage: 2G sTorageClaSSNaMe: nfs-csi voluMeMode: filesystem vMsTorage: image: pullPolicy: IfNotPResent ReposiTory: images.huazAI.coM/Release/vMsTorage tag: v1.63.0 podMetadata: labels: vicToriaMetRics: vMsTorage ReplicaCount: 1 ResouRces: liMITs: CPu: “1” MeMoRy: 1500Mi requests: CPu: 500M MeMoRy: 750Mi sTorage: voluMeClAIMtemplate: spec: acceSSModes: – ReadWRITeOnce ResouRces: requests: sTorage: 20G sTorageClaSSNaMe: nfs-csi voluMeMode: filesystem sTorageDataPath: /vM-data # install vMclUSteR kubectl apply -f vMclUSteR-install.yaMl # 检查vMclUSteR install结果 [Root@test opt]# kubectl get po -n MoniToring-system NAME READY STATUS RESTARTS AGE vM-opeRaTor-76dd8f7b84-gsbfs 1/1 Running 0 26h vMinseRt-vMclUSteR-MAIn-69766c8f4-R795w 1/1 Running 0 25h vMselect-vMclUSteR-MAIn-0 1/1 Running 0 25h vMsTorage-vMclUSteR-MAIn-0 1/1 Running 0 25h
4、创建vMinseRt和vMselect seRvice
# 查看创建的svc [Root@test opt]# kubectl get svc -n MoniToring-system NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE vMinseRt-vMclUSteR-MAIn ClUSteRIP 10.0.182.73 8480/TCP 25h vMselect-vMclUSteR-MAIn ClUSteRIP None 8481/TCP 25h vMsTorage-vMclUSteR-MAIn ClUSteRIP None 8482/TCP,8400/TCP,8401/TCP 25h # 这里为了方便不同k8s集群的数据都可以存储到该vM来,同时方便后续查询数据, # 重新创建两个svc,类型为nodepoRt,分别为vMinseRt-lbsvc和vMselect-lbsvc.同时配置云上的lb监听8480和8481端口,后端服务器为vM所在集群的节点IP, # 端口为vMinseRt-lbsvc和vMsLeect-lbsvc两个seRvice暴露出来的nodepoRt # 但与vM同k8s集群的比如openteleMetry需要存储数据时,仍然可以用: # vMinseRt-vMclUSteR-MAIn.kube-system.svc.clUSteR.local:8480 # 与vM不同k8s集群的如openteleMetry存储数据时使用lb:8480 # cat vMinseRt-lb-svc.yaMl APIversion: v1 kind: SeRvice Metadata: labels: app.kubeRnetes.io/coMponent: MoniToring app.kubeRnetes.io/instance: vMclUSteR-MAIn app.kubeRnetes.io/naMe: vMinseRt naMe: vMinseRt-vMclUSteR-MAIn-lbsvc naMespace: MoniToring-system spec: exteRnalTRaFFiCPolicy: ClUSteR poRts: – naMe: http nodePoRt: 30135 poRt: 8480 Protocol: TCP taRgetPoRt: 8480 selecTor: app.kubeRnetes.io/coMponent: MoniToring app.kubeRnetes.io/instance: vMclUSteR-MAIn app.kubeRnetes.io/naMe: vMinseRt seSSionAFFinITy: None type: NodePoRt # cat vMselect-lb-svc.yaMl APIversion: v1 kind: SeRvice Metadata: labels: app.kubeRnetes.io/coMponent: MoniToring app.kubeRnetes.io/instance: vMclUSteR-MAIn app.kubeRnetes.io/naMe: vMselect naMe: vMselect-vMclUSteR-MAIn-lbsvc naMespace: MoniToring-system spec: exteRnalTRaFFiCPolicy: ClUSteR poRts: – naMe: http nodePoRt: 31140 poRt: 8481 Protocol: TCP taRgetPoRt: 8481 selecTor: app.kubeRnetes.io/coMponent: MoniToring app.kubeRnetes.io/instance: vMclUSteR-MAIn app.kubeRnetes.io/naMe: vMselect seSSionAFFinITy: None type: NodePoRt # 创建svc kubectl apply -f vMselect-lb-svc.yaMl kubectl apply -f vMinseRt-lb-svc.yaMl # !!配置云上lb, 自行配置 # 最后检查vM相关的pod和svc [Root@test opt]# kubectl get po,svc -n MoniToring-system NAME READY STATUS RESTARTS AGE pod/vM-opeRaTor-76dd8f7b84-gsbfs 1/1 Running 0 30h pod/vMinseRt-vMclUSteR-MAIn-69766c8f4-R795w 1/1 Running 0 29h pod/vMselect-vMclUSteR-MAIn-0 1/1 Running 0 29h pod/vMsTorage-vMclUSteR-MAIn-0 1/1 Running 0 29h NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE seRvice/vMinseRt-vMclUSteR-MAIn ClUSteRIP 10.0.182.73 8480/TCP 29h seRvice/vMinseRt-vMclUSteR-MAIn-lbsvc NodePoRt 10.0.255.212 8480:30135/TCP 7h54M seRvice/vMselect-vMclUSteR-MAIn ClUSteRIP None 8481/TCP 29h seRvice/vMselect-vMclUSteR-MAIn-lbsvc NodePoRt 10.0.45.239 8481:31140/TCP 7h54M seRvice/vMsTorage-vMclUSteR-MAIn ClUSteRIP None 8482/TCP,8400/TCP,8401/TCP 29h 安装ProMetheUS-expoteR
这里还是来安装node expoRteR,暴露k8s节点数据,由后续的openteleMetry来采集,并通过vMinseRt存储到vMsTorage。数据通过vMselect来进行查询
# kubectl apply -f ProMetheUS-node-expoRteR-install.yaMl APIversion: apps/v1 kind: DaeMonSet Metadata: labels: app: ProMetheUS-node-expoRteR Release: ProMetheUS-node-expoRteR naMe: ProMetheUS-node-expoRteR naMespace: kube-system spec: revisionHisToryLiMIT: 10 selecTor: MatchLabels: app: ProMetheUS-node-expoRteR Release: ProMetheUS-node-expoRteR template: Metadata: labels: app: ProMetheUS-node-expoRteR Release: ProMetheUS-node-expoRteR spec: contAIneRs: – aRgs: – –path.Procfs=/host/Proc – –path.sYsfs=/host/sYs – –path.Rootfs=/host/Root – –web.listen-addReSS=$(host_IP):9100 env: – naMe: host_IP value: 0.0.0.0 image: images.huazAI.coM/Release/node-expoRteR:v1.1.2 imagePullPolicy: IfNotPResent liveneSSProbe: failureThReshold: 3 httpGet: path: / poRt: 9100 scheMe: HTTP peRiodSeconds: 10 sUCceSSThReshold: 1 tiMeoutSeconds: 1 naMe: node-expoRteR poRts: – contAIneRPoRt: 9100 hostPoRt: 9100 naMe: MetRics Protocol: TCP ReadineSSProbe: failureThReshold: 3 httpGet: path: / poRt: 9100 scheMe: HTTP peRiodSeconds: 10 sUCceSSThReshold: 1 tiMeoutSeconds: 1 ResouRces: liMITs: CPu: 200M MeMoRy: 50Mi requests: CPu: 100M MeMoRy: 30Mi teRMinationMeSSagePath: /dev/teRMination-log teRMinationMeSSagePolicy: file voluMeMounts: – MountPath: /host/Proc naMe: Proc ReadOnly: tRue – MountPath: /host/sYs naMe: sYs ReadOnly: tRue – MountPath: /host/Root MountPropagation: hostToContAIneR naMe: Root ReadOnly: tRue DNSPolicy: ClUSteRFiRst hostNetwoRk: tRue hostPID: tRue RestaRtPolicy: AlwaYs scheduleRNaMe: deFAult-scheduleR securitycontext: fSGRoup: 65534 RunASGRoup: 65534 RunAsNonRoot: tRue RunAsUser: 65534 seRviceaccount: ProMetheUS-node-expoRteR seRviceaccountNaMe: ProMetheUS-node-expoRteR teRMinationGRACEPeRiodSeconds: 30 toleRations: – Effect: NoSchedule opeRaTor: Exists voluMes: – hostPath: path: /Proc type: “” naMe: Proc – hostPath: path: /sYs type: “” naMe: sYs – hostPath: path: / type: “” naMe: Root updatestRategy: Rollingupdate: Maxunavailable: 1 type: Rollingupdate # 检查node-expoRteR [Root@test ~]# kubectl get po -n kube-system |gRep ProMetheUS ProMetheUS-node-expoRteR-89wjk 1/1 Running 0 31h ProMetheUS-node-expoRteR-hj4gh 1/1 Running 0 31h ProMetheUS-node-expoRteR-hxM8t 1/1 Running 0 31h ProMetheUS-node-expoRteR-nhqp6 1/1 Running 0 31h 安装openteleMetry
ProMetheUS node expoRteR安装好之后,再来安装openteleMetry(以后有机会再介绍)
# openteleMetry 配置文件。定义数据的接收、处理、导出 # 1.ReceiveRs即从哪里获取数据 # 2.ProceSSoRs即对获取的数据的处理 # 3.expoRteRs即将处理过的数据导出到哪里,本次数据通过vMinseRt最终写入到vMsTorage # kubectl apply -f openteleMetry-install-cM.yaMl APIversion: v1 data: Relay: | expoRteRs: ProMetheUSRemotewRITe: # 我这里配置lb_IP:8480,即vMinseRt地址 endpoint: http://lb_IP:8480/inseRt/0/ProMetheUS # 不同的集群添加不同的label,比如clUSteR: uat/pRd exteRnal_labels: clUSteR: uat extensions: health_check: {} ProceSSoRs: BATch: {} MeMoRy_liMITeR: ballast_size_Mib: 819 check_inteRval: 5s liMIT_Mib: 1638 spike_liMIT_Mib: 512 ReceiveRs: ProMetheUS: config: scRape_configs: – job_naMe: openteleMetry-collecTor scRape_inteRval: 10s static_configs: – taRgets: – localhost:8888 …省略… – job_naMe: kube-state-MetRics kubeRnetes_sd_configs: – naMespaces: naMes: – kube-system Role: seRvice MetRic_Relabel_configs: – Regex: ReplicaSet;([w|-]+)-[0-9|a-z]+ ReplACEMent: $$1 souRce_labels: – cReated_by_kind – cReated_by_naMe taRget_label: cReated_by_naMe – Regex: ReplicaSet ReplACEMent: DeployMent souRce_labels: – cReated_by_kind taRget_label: cReated_by_kind Relabel_configs: – action: keep Regex: kube-state-MetRics souRce_labels: – __Meta_kubeRnetes_seRvice_naMe – job_naMe: node-expoRteR kubeRnetes_sd_configs: – naMespaces: naMes: – kube-system Role: endpoints Relabel_configs: – action: keep Regex: node-expoRteR souRce_labels: – __Meta_kubeRnetes_seRvice_naMe – souRce_labels: – __Meta_kubeRnetes_pod_node_naMe taRget_label: node – souRce_labels: – __Meta_kubeRnetes_pod_host_IP taRget_label: host_IP …省略… seRvice: # 上面定义的ReceivoRs、ProceSSoRs、expoRteRs以及extensions需要在这里配置,不然不起作用 extensions: – health_check pIPelines: MetRics: expoRteRs: – ProMetheUSRemotewRITe ProceSSoRs: – MeMoRy_liMITeR – BATch ReceiveRs: – ProMetheUS kind: ConfigMap Metadata: annOTAtions: Meta.helM.sh/Release-naMe: openteleMetry-collecTor-hua Meta.helM.sh/Release-naMespace: kube-system labels: app.kubeRnetes.io/instance: openteleMetry-collecTor-hua app.kubeRnetes.io/naMe: openteleMetry-collecTor-hua naMe: openteleMetry-collecTor-hua naMespace: kube-system # 安装openteleMetry # kubectl apply -f openteleMetry-install.yaMl APIversion: apps/v1 kind: DeployMent Metadata: labels: app.kubeRnetes.io/instance: openteleMetry-collecTor-hua app.kubeRnetes.io/naMe: openteleMetry-collecTor-hua naMe: openteleMetry-collecTor-hua naMespace: kube-system spec: ProgReSSDeadlineSeconds: 600 Replicas: 1 revisionHisToryLiMIT: 10 selecTor: MatchLabels: app.kubeRnetes.io/instance: openteleMetry-collecTor-hua app.kubeRnetes.io/naMe: openteleMetry-collecTor-hua stRategy: Rollingupdate: MaxSuRge: 25% Maxunavailable: 25% type: Rollingupdate template: Metadata: labels: app.kubeRnetes.io/instance: openteleMetry-collecTor-hua app.kubeRnetes.io/naMe: openteleMetry-collecTor-hua spec: contAIneRs: – command: – /otelcol – –config=/conf/Relay.yaMl – –MetRics-addR=0.0.0.0:8888 – –MeM-ballast-size-Mib=819 env: – naMe: MY_POD_IP valueFRoM: fieldRef: APIversion: v1 fieldPath: statUS.podIP image: images.huazAI.coM/Release/openteleMetry-collecTor:0.27.0 imagePullPolicy: IfNotPResent liveneSSProbe: failureThReshold: 3 httpGet: path: / poRt: 13133 scheMe: HTTP peRiodSeconds: 10 sUCceSSThReshold: 1 tiMeoutSeconds: 1 naMe: openteleMetry-collecTor-hua poRts: – contAIneRPoRt: 4317 naMe: otlp Protocol: TCP ReadineSSProbe: failureThReshold: 3 httpGet: path: / poRt: 13133 scheMe: HTTP peRiodSeconds: 10 sUCceSSThReshold: 1 tiMeoutSeconds: 1 ResouRces: liMITs: CPu: “1” MeMoRy: 2Gi requests: CPu: 500M MeMoRy: 1Gi voluMeMounts: – MountPath: /conf # 上面创建的给oepntelneMetry用的configMap naMe: openteleMetry-collecTor-configMap-hua – MountPath: /etc/otel-collecTor/secRets/etcd-ceRt/ naMe: etcd-tls ReadOnly: tRue DNSPolicy: ClUSteRFiRst RestaRtPolicy: AlwaYs scheduleRNaMe: deFAult-scheduleR securitycontext: {} # sa这里自行创建吧 seRviceaccount: openteleMetry-collecTor-hua seRviceaccountNaMe: openteleMetry-collecTor-hua teRMinationGRACEPeRiodSeconds: 30 voluMes: – configMap: deFAultMode: 420 ITeMs: – key: Relay path: Relay.yaMl # 上面创建的给oepntelneMetry用的configMap naMe: openteleMetry-collecTor-hua naMe: openteleMetry-collecTor-configMap-hua – naMe: etcd-tls secRet: deFAultMode: 420 secRetNaMe: etcd-tls # 检查openteleMetry运行情况。如果openteleMetry与vM在同一个k8s集群,请写seRvice那一套,不要使用lb(受制于云上 # 4层监听器的后端服务器暂不能支持同时作为客户端和服务端) [Root@kube-contRol-1 ~]# kubectl get po -n kube-system |gRep openteleMetry-collecTor-hua openteleMetry-collecTor-hua-647c6c64c7-j6p4b 1/1 Running 0 8h 安装检查
所有的组件安装完成后,在浏览器输入http://lb:8481/select/0/vMui,然后在seRveR uRl输入;http://lb:8481/select/0/ProMetheUS。最后再输入对应的指标就可以查询数据了,左上角还可以开启自动刷新!

总结
整个安装过程还是比较简单的。一旦安装完成后,即可存储多个k8s集群的监控数据。vM是支持基于PRoMeQL的MetRicsQL的,也能够作为gRaFAna的数据源。想想之前需要手动在每个k8s集群单独安装ProMetheUS,还要去配置存储,需要查询数据时,要单独打开每个集群的ProMetheUS UI是不是显得稍微麻烦一点呢。如果你也觉得vM不错,动手试试看吧!
全文参考 https://Github.coM/VicToriaMetRics/VicToriaMetRics/tRee/clUSteR https://docs.vicToriaMetRics.coM/ https://openteleMetRy.io/docs/ https://ProMetheUS.io/docs/ProMetheUS/latest/configuration/configuRation/