Monitoring
Available Metrics
The PeSIT Wizard server exposes Prometheus metrics on /actuator/prometheus.
Key Metrics
| Metric | Description |
|---|---|
pesitwizard_connections_active | Active PeSIT connections |
pesitwizard_connections_total | Total connections |
pesitwizard_transfers_total | Number of transfers |
pesitwizard_transfers_bytes_total | Transferred volume (bytes) |
pesitwizard_transfers_duration_seconds | Transfer duration |
pesitwizard_errors_total | Number of errors |
pesitwizard_cluster_members | Cluster members |
pesitwizard_cluster_is_leader | 1 if leader, 0 otherwise |
Prometheus Integration
Prometheus Configuration
yaml
# prometheus.yml
scrape_configs:
- job_name: 'pesitwizard-server'
kubernetes_sd_configs:
- role: pod
namespaces:
names: ['pesitwizard']
relabel_configs:
- source_labels: [__meta_kubernetes_pod_label_app]
regex: pesitwizard-server
action: keep
- source_labels: [__meta_kubernetes_pod_container_port_number]
regex: "8080"
action: keepUseful Queries
promql
# Transfer rate per minute
rate(pesitwizard_transfers_total[5m]) * 60
# Transferred volume per hour
increase(pesitwizard_transfers_bytes_total[1h])
# Error rate
rate(pesitwizard_errors_total[5m]) / rate(pesitwizard_transfers_total[5m])
# Average transfer duration
rate(pesitwizard_transfers_duration_seconds_sum[5m]) / rate(pesitwizard_transfers_duration_seconds_count[5m])Grafana Dashboards
Main Dashboard
Import the dashboard from: /grafana/pesitwizard-dashboard.json
Included panels:
- Transfers per minute
- Transferred volume
- Active connections
- Error rate
- Cluster status
- Top partners
Recommended Alerts
yaml
# alerting-rules.yml
groups:
- name: pesitwizard
rules:
- alert: PesitHighErrorRate
expr: rate(pesitwizard_errors_total[5m]) > 0.1
for: 5m
labels:
severity: warning
annotations:
summary: "High PeSIT Wizard error rate"
- alert: PesitNoLeader
expr: sum(pesitwizard_cluster_is_leader) == 0
for: 1m
labels:
severity: critical
annotations:
summary: "No PeSIT Wizard leader"
- alert: PesitClusterDegraded
expr: pesitwizard_cluster_members < 3
for: 5m
labels:
severity: warning
annotations:
summary: "PeSIT Wizard cluster degraded"Logs
Log Format
2025-01-10 10:30:00.123 INFO [pesitwizard-server] [session-123] CONNECT partner=CLIENT_XYZ ip=192.168.1.100
2025-01-10 10:30:01.456 INFO [pesitwizard-server] [session-123] CREATE file=TRANSFER.XML virtualFile=TRANSFERS
2025-01-10 10:30:05.789 INFO [pesitwizard-server] [session-123] TRANSFER_COMPLETE bytes=15234 duration=4333msCentralization with ELK
yaml
# filebeat.yml
filebeat.inputs:
- type: container
paths:
- /var/log/containers/pesitwizard-server-*.log
processors:
- add_kubernetes_metadata: ~
output.elasticsearch:
hosts: ["elasticsearch:9200"]
index: "pesitwizard-%{+yyyy.MM.dd}"Useful Kibana Queries
# Errors in the last 24 hours
level:ERROR AND kubernetes.labels.app:pesitwizard-server
# Transfers for a partner
message:"TRANSFER_COMPLETE" AND partner:CLIENT_XYZ
# Failed connections
message:"CONNECT" AND status:FAILEDHealth Checks
Endpoints
| Endpoint | Description |
|---|---|
/actuator/health | Overall health |
/actuator/health/readiness | Ready to receive traffic |
/actuator/health/liveness | Application alive |
Kubernetes Probes
yaml
readinessProbe:
httpGet:
path: /actuator/health/readiness
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
livenessProbe:
httpGet:
path: /actuator/health/liveness
port: 8080
initialDelaySeconds: 60
periodSeconds: 30Alerting
Email
Configure email alerts in the application:
yaml
pesitwizard:
alerting:
email:
enabled: true
smtp-host: smtp.example.com
from: pesitwizard@example.com
to: ops@example.com
triggers:
- type: TRANSFER_FAILED
- type: CONNECTION_FAILED
- type: CLUSTER_DEGRADEDWebhook
yaml
pesitwizard:
alerting:
webhook:
enabled: true
url: https://hooks.slack.com/services/xxx
events:
- TRANSFER_FAILED
- CLUSTER_LEADER_CHANGED