.. | ||
alertmanager.yaml | ||
blackbox-exporter.yaml | ||
mikrotik-exporter.yaml | ||
node-exporter.yaml | ||
ping-exporter.yaml | ||
prometheus.yaml | ||
README.md | ||
snmp-configs.yaml | ||
snmp-exporter.yaml | ||
zrepl.yaml |
Monitoring namespace
Prometheus is accessible at prom.k-space.ee and the corresponding AlertManager is accessible at am.k-space.ee. Both are deployed by ArgoCD from this Git repo directory using Prometheus operator.
Note that Prometheus and other monitoring stack components should use the
dedicated: monitoring
Kubernetes node selector to make sure the components
get scheduled on mon[1-3]
nodes which are hosted in a privileged VLAN where
they have access to UPS SNMP targets, Mikrotik router/switch API-s etc.
To add monitoring targets inside the Kubernetes cluster make use of PodMonitor or ServiceMonitor custom resource definitions.
For external targets (ab)use the Probe CRD as seen in node-exporter.yaml
or ping-exporter.yaml
Alerts are sent to #kube-prod Slack channel. The alerting rules are automatically picked up by Prometheus operator via Kubernetes manifests utilizing the operator's PrometheusRule custom resource definitions.
Sample queries:
- SSD/HDD temperatures
- HDD power on hours, 8760 hours per year
- CPU/NB temperatures
- Disk space left
- Minio s3 egress, internode egress, storage used
To reconfigure SNMP targets etc:
kubectl delete -n monitoring configmap snmp-exporter
kubectl create -n monitoring configmap snmp-exporter --from-file=snmp.yml=snmp-configs.yaml
To set Slack secrets:
kubectl create -n monitoring secret generic slack-secrets \
--from-literal=webhook-url=https://hooks.slack.com/services/...
To set Mikrotik secrets:
kubectl create -n monitoring secret generic mikrotik-exporter \
--from-literal=MIKROTIK_PASSWORD='f7W!H*Pu' \
--from-literal=PROMETHEUS_BEARER_TOKEN=$(cat /dev/urandom | base64 | head -c 30)