monitoring: Update README.md

This commit is contained in:
Lauri Võsandi 2024-08-12 22:06:28 +03:00
parent 7c2b862ca8
commit c2d08d8a80

View File

@ -1,8 +1,19 @@
## Monitoring
Additional docs: https://wiki.k-space.ee/en/hosting/monitoring
This namespace is managed by
[ArgoCD](https://argocd.k-space.ee/applications/argocd/monitoring)
Prometheus is accessible at [prom.k-space.ee](https://prom.k-space.ee/)
and the corresponding AlertManager is accessible at [am.k-space.ee](https://am.k-space.ee/).
Both are [deployed by ArgoCD](https://argocd.k-space.ee/applications/monitoring)
from this Git repo directory using Prometheus operator.
Alerts are sent to #kube-prod Slack channel
Sample queries:
* [SSD/HDD temperatures](https://prom.k-space.ee/graph?g0.expr=%7B__name__%3D~%22smartmon_(temperature_celsius%7Cairflow_temperature_cel)_raw_value%22%7D&g0.tab=0&g0.stacked=0&g0.range_input=1d)
* [HDD power on hours](https://prom.k-space.ee/graph?g0.range_input=30m&g0.expr=smartmon_power_on_hours_raw_value&g0.tab=0), 8760 hours per year
* [CPU/NB temperatures](https://prom.k-space.ee/graph?g0.range_input=1h&g0.expr=node_hwmon_temp_celsius&g0.tab=0)
* [Disk space left](https://prom.k-space.ee/graph?g0.range_input=1h&g0.expr=node_filesystem_avail_bytes&g0.tab=1)
* Minio [s3 egress](https://prom.k-space.ee/graph?g0.expr=rate(minio_s3_traffic_sent_bytes%5B3m%5D)&g0.tab=0&g0.display_mode=lines&g0.show_exemplars=0&g0.range_input=6h), [internode egress](https://prom.k-space.ee/graph?g0.expr=rate(minio_inter_node_traffic_sent_bytes%5B2m%5D)&g0.tab=0&g0.display_mode=lines&g0.show_exemplars=0&g0.range_input=6h), [storage used](https://prom.k-space.ee/graph?g0.expr=minio_node_disk_used_bytes&g0.tab=0&g0.display_mode=lines&g0.show_exemplars=0&g0.range_input=6h)
To reconfigure SNMP targets etc:
@ -26,4 +37,3 @@ To set Mikrotik secrets:
--from-literal=PROMETHEUS_BEARER_TOKEN=$(cat /dev/urandom | base64 | head -c 30)
```