kube/monitoring
2024-09-12 22:15:10 +03:00
..
alertmanager.yaml monitoring: Specify resource limits 2024-08-24 12:36:37 +03:00
blackbox-exporter.yaml Consolidate monitoring stack to Kube master nodes 2024-08-23 08:00:23 +03:00
mikrotik-exporter.yaml monitoring: Fix mikrotik-exporter formatting 2024-09-12 21:48:43 +03:00
node-exporter.yaml monitoring: Add revisionHistoryLimit: 0 2024-08-24 23:58:07 +03:00
ping-exporter.yaml Consolidate monitoring stack to Kube master nodes 2024-08-23 08:00:23 +03:00
prometheus.yaml monitoring: Enable Prometheus admin API 2024-09-04 22:28:01 +03:00
README.md monitoring: Update Mikrotik exporter 2024-09-04 22:33:15 +03:00
snmp-configs.yaml Move Prometheus instance to monitoring namespace 2023-08-19 09:24:48 +03:00
snmp-exporter.yaml monitoring: Fix snmp-exporter 2024-09-12 22:15:10 +03:00
zrepl.yaml Move Ansible directory to separate repo 2024-08-12 21:41:36 +03:00

Monitoring namespace

Prometheus is accessible at prom.k-space.ee and the corresponding AlertManager is accessible at am.k-space.ee. Both are deployed by ArgoCD from this Git repo directory using Prometheus operator.

Note that Prometheus and other monitoring stack components should use appropriate node selector to make sure the components get scheduled on nodes which are hosted in a privileged VLAN where they have access to UPS SNMP targets, Mikrotik router/switch API-s etc.

For users

To add monitoring targets inside the Kubernetes cluster make use of PodMonitor or ServiceMonitor custom resource definitions.

For external targets (ab)use the Probe CRD as seen in node-exporter.yaml or ping-exporter.yaml

Alerts are sent to #kube-prod Slack channel. The alerting rules are automatically picked up by Prometheus operator via Kubernetes manifests utilizing the operator's PrometheusRule custom resource definitions.

Sample queries:

Another useful tool for exploring Prometheus operator custom resources is doc.crds.dev/github.com/prometheus-operator/prometheus-operator

For administrators

To reconfigure SNMP targets etc:

kubectl delete -n monitoring configmap snmp-exporter
kubectl create -n monitoring configmap snmp-exporter --from-file=snmp.yml=snmp-configs.yaml

To set Slack secrets:

 kubectl create -n monitoring secret generic slack-secrets \
    --from-literal=webhook-url=https://hooks.slack.com/services/...

To set Mikrotik secrets:

 kubectl create -n monitoring secret generic mikrotik-exporter \
  --from-literal=username=netpoller \
  --from-literal=password=...

To wipe timeseries:

for replica in $(seq 0 2); do
  kubectl exec -n monitoring prometheus-prometheus-$replica -- wget --post-data='match[]={__name__=~"mikrotik_.*"}' http://127.0.0.1:9090/api/v1/admin/tsdb/delete_series -O -
done