Skip to content

Monitoring

Available Metrics

The PeSIT Wizard server exposes Prometheus metrics on /actuator/prometheus.

Key Metrics

MetricDescription
pesitwizard_connections_activeActive PeSIT connections
pesitwizard_connections_totalTotal connections
pesitwizard_transfers_totalNumber of transfers
pesitwizard_transfers_bytes_totalTransferred volume (bytes)
pesitwizard_transfers_duration_secondsTransfer duration
pesitwizard_errors_totalNumber of errors
pesitwizard_cluster_membersCluster members
pesitwizard_cluster_is_leader1 if leader, 0 otherwise

Prometheus Integration

Prometheus Configuration

yaml
# prometheus.yml
scrape_configs:
  - job_name: 'pesitwizard-server'
    kubernetes_sd_configs:
      - role: pod
        namespaces:
          names: ['pesitwizard']
    relabel_configs:
      - source_labels: [__meta_kubernetes_pod_label_app]
        regex: pesitwizard-server
        action: keep
      - source_labels: [__meta_kubernetes_pod_container_port_number]
        regex: "8080"
        action: keep

Useful Queries

promql
# Transfer rate per minute
rate(pesitwizard_transfers_total[5m]) * 60

# Transferred volume per hour
increase(pesitwizard_transfers_bytes_total[1h])

# Error rate
rate(pesitwizard_errors_total[5m]) / rate(pesitwizard_transfers_total[5m])

# Average transfer duration
rate(pesitwizard_transfers_duration_seconds_sum[5m]) / rate(pesitwizard_transfers_duration_seconds_count[5m])

Grafana Dashboards

Main Dashboard

Import the dashboard from: /grafana/pesitwizard-dashboard.json

Included panels:

  • Transfers per minute
  • Transferred volume
  • Active connections
  • Error rate
  • Cluster status
  • Top partners
yaml
# alerting-rules.yml
groups:
  - name: pesitwizard
    rules:
      - alert: PesitHighErrorRate
        expr: rate(pesitwizard_errors_total[5m]) > 0.1
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High PeSIT Wizard error rate"

      - alert: PesitNoLeader
        expr: sum(pesitwizard_cluster_is_leader) == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "No PeSIT Wizard leader"

      - alert: PesitClusterDegraded
        expr: pesitwizard_cluster_members < 3
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "PeSIT Wizard cluster degraded"

Logs

Log Format

2025-01-10 10:30:00.123 INFO  [pesitwizard-server] [session-123] CONNECT partner=CLIENT_XYZ ip=192.168.1.100
2025-01-10 10:30:01.456 INFO  [pesitwizard-server] [session-123] CREATE file=TRANSFER.XML virtualFile=TRANSFERS
2025-01-10 10:30:05.789 INFO  [pesitwizard-server] [session-123] TRANSFER_COMPLETE bytes=15234 duration=4333ms

Centralization with ELK

yaml
# filebeat.yml
filebeat.inputs:
  - type: container
    paths:
      - /var/log/containers/pesitwizard-server-*.log
    processors:
      - add_kubernetes_metadata: ~

output.elasticsearch:
  hosts: ["elasticsearch:9200"]
  index: "pesitwizard-%{+yyyy.MM.dd}"

Useful Kibana Queries

# Errors in the last 24 hours
level:ERROR AND kubernetes.labels.app:pesitwizard-server

# Transfers for a partner
message:"TRANSFER_COMPLETE" AND partner:CLIENT_XYZ

# Failed connections
message:"CONNECT" AND status:FAILED

Health Checks

Endpoints

EndpointDescription
/actuator/healthOverall health
/actuator/health/readinessReady to receive traffic
/actuator/health/livenessApplication alive

Kubernetes Probes

yaml
readinessProbe:
  httpGet:
    path: /actuator/health/readiness
    port: 8080
  initialDelaySeconds: 30
  periodSeconds: 10

livenessProbe:
  httpGet:
    path: /actuator/health/liveness
    port: 8080
  initialDelaySeconds: 60
  periodSeconds: 30

Alerting

Email

Configure email alerts in the application:

yaml
pesitwizard:
  alerting:
    email:
      enabled: true
      smtp-host: smtp.example.com
      from: pesitwizard@example.com
      to: ops@example.com
    triggers:
      - type: TRANSFER_FAILED
      - type: CONNECTION_FAILED
      - type: CLUSTER_DEGRADED

Webhook

yaml
pesitwizard:
  alerting:
    webhook:
      enabled: true
      url: https://hooks.slack.com/services/xxx
      events:
        - TRANSFER_FAILED
        - CLUSTER_LEADER_CHANGED

PeSIT Wizard Enterprise - Console d'administration