forked from k-space/kube
		
	
		
			
				
	
	
		
			69 lines
		
	
	
		
			3.3 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
			
		
		
	
	
			69 lines
		
	
	
		
			3.3 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
| # Monitoring namespace
 | |
| 
 | |
| Prometheus is accessible at [prom.k-space.ee](https://prom.k-space.ee/)
 | |
| and the corresponding AlertManager is accessible at [am.k-space.ee](https://am.k-space.ee/).
 | |
| Both are [deployed by ArgoCD](https://argocd.k-space.ee/applications/monitoring)
 | |
| from this Git repo directory using Prometheus operator.
 | |
| 
 | |
| Note that Prometheus and other monitoring stack components should use appropriate
 | |
| node selector to make sure the components get scheduled on nodes which are
 | |
| hosted in a privileged VLAN where they have access to UPS SNMP targets,
 | |
| Mikrotik router/switch API-s etc.
 | |
| 
 | |
| ## For users
 | |
| 
 | |
| To add monitoring targets inside the Kubernetes cluster make use of
 | |
| [PodMonitor](https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/user-guides/getting-started.md#using-podmonitors) or ServiceMonitor custom
 | |
| resource definitions.
 | |
| 
 | |
| For external targets (ab)use the Probe CRD as seen in `node-exporter.yaml`
 | |
| or `ping-exporter.yaml`
 | |
| 
 | |
| Alerts are sent to #kube-prod Slack channel. The alerting rules are automatically
 | |
| picked up by Prometheus operator via Kubernetes manifests utilizing
 | |
| the operator's
 | |
| [PrometheusRule](https://github.com/prometheus-operator/prometheus-operator/blob/main/Documentation/user-guides/alerting.md#deploying-prometheus-rules) custom resource definitions.
 | |
| 
 | |
| Sample queries:
 | |
| 
 | |
| * [SSD/HDD temperatures](https://prom.k-space.ee/graph?g0.expr=%7B__name__%3D~%22smartmon_(temperature_celsius%7Cairflow_temperature_cel)_raw_value%22%7D&g0.tab=0&g0.stacked=0&g0.range_input=1d)
 | |
| * [HDD power on hours](https://prom.k-space.ee/graph?g0.range_input=30m&g0.expr=smartmon_power_on_hours_raw_value&g0.tab=0), 8760 hours per year
 | |
| * [CPU/NB temperatures](https://prom.k-space.ee/graph?g0.range_input=1h&g0.expr=node_hwmon_temp_celsius&g0.tab=0)
 | |
| * [Disk space left](https://prom.k-space.ee/graph?g0.range_input=1h&g0.expr=node_filesystem_avail_bytes&g0.tab=1)
 | |
| * Minio [s3 egress](https://prom.k-space.ee/graph?g0.expr=rate(minio_s3_traffic_sent_bytes%5B3m%5D)&g0.tab=0&g0.display_mode=lines&g0.show_exemplars=0&g0.range_input=6h), [internode egress](https://prom.k-space.ee/graph?g0.expr=rate(minio_inter_node_traffic_sent_bytes%5B2m%5D)&g0.tab=0&g0.display_mode=lines&g0.show_exemplars=0&g0.range_input=6h), [storage used](https://prom.k-space.ee/graph?g0.expr=minio_node_disk_used_bytes&g0.tab=0&g0.display_mode=lines&g0.show_exemplars=0&g0.range_input=6h)
 | |
| 
 | |
| Another useful tool for exploring Prometheus operator custom resources is
 | |
| [doc.crds.dev/github.com/prometheus-operator/prometheus-operator](https://doc.crds.dev/github.com/prometheus-operator/prometheus-operator@v0.75.0)
 | |
| 
 | |
| # For administrators
 | |
| 
 | |
| To reconfigure SNMP targets etc:
 | |
| 
 | |
| ```
 | |
| kubectl delete -n monitoring configmap snmp-exporter
 | |
| kubectl create -n monitoring configmap snmp-exporter --from-file=snmp.yml=snmp-configs.yaml
 | |
| ```
 | |
| 
 | |
| To set Slack secrets:
 | |
| 
 | |
| ```
 | |
|  kubectl create -n monitoring secret generic slack-secrets \
 | |
|     --from-literal=webhook-url=https://hooks.slack.com/services/...
 | |
| ```
 | |
| 
 | |
| To set Mikrotik secrets:
 | |
| 
 | |
| ```
 | |
|  kubectl create -n monitoring secret generic mikrotik-exporter \
 | |
|   --from-literal=username=netpoller \
 | |
|   --from-literal=password=...
 | |
| ```
 | |
| 
 | |
| To wipe timeseries:
 | |
| 
 | |
| ```
 | |
| for replica in $(seq 0 2); do
 | |
|   kubectl exec -n monitoring prometheus-prometheus-$replica -- wget --post-data='match[]={__name__=~"mktxp_.*"}' http://127.0.0.1:9090/api/v1/admin/tsdb/delete_series -O -
 | |
| done
 | |
| ```
 |