Skip to content

Monitoring

Métriques disponibles

Le serveur PeSIT Wizard expose des métriques Prometheus sur /actuator/prometheus.

Métriques clés

MétriqueDescription
pesitwizard_connections_activeConnexions PeSIT Wizard actives
pesitwizard_connections_totalTotal des connexions
pesitwizard_transfers_totalNombre de transferts
pesitwizard_transfers_bytes_totalVolume transféré (bytes)
pesitwizard_transfers_duration_secondsDurée des transferts
pesitwizard_errors_totalNombre d'erreurs
pesitwizard_cluster_membersMembres du cluster
pesitwizard_cluster_is_leader1 si leader, 0 sinon

Intégration Prometheus

Configuration Prometheus

yaml
# prometheus.yml
scrape_configs:
  - job_name: 'pesitwizard-server'
    kubernetes_sd_configs:
      - role: pod
        namespaces:
          names: ['pesitwizard']
    relabel_configs:
      - source_labels: [__meta_kubernetes_pod_label_app]
        regex: pesitwizard-server
        action: keep
      - source_labels: [__meta_kubernetes_pod_container_port_number]
        regex: "8080"
        action: keep

Requêtes utiles

promql
# Taux de transferts par minute
rate(pesitwizard_transfers_total[5m]) * 60

# Volume transféré par heure
increase(pesitwizard_transfers_bytes_total[1h])

# Taux d'erreur
rate(pesitwizard_errors_total[5m]) / rate(pesitwizard_transfers_total[5m])

# Durée moyenne des transferts
rate(pesitwizard_transfers_duration_seconds_sum[5m]) / rate(pesitwizard_transfers_duration_seconds_count[5m])

Dashboards Grafana

Dashboard principal

Importez le dashboard depuis : /grafana/pesitwizard-dashboard.json

Panels inclus :

  • Transferts par minute
  • Volume transféré
  • Connexions actives
  • Taux d'erreur
  • Statut du cluster
  • Top partenaires

Alertes recommandées

yaml
# alerting-rules.yml
groups:
  - name: pesitwizard
    rules:
      - alert: PesitHighErrorRate
        expr: rate(pesitwizard_errors_total[5m]) > 0.1
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Taux d'erreur PeSIT Wizard élevé"
          
      - alert: PesitNoLeader
        expr: sum(pesitwizard_cluster_is_leader) == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Pas de leader PeSIT Wizard"
          
      - alert: PesitClusterDegraded
        expr: pesitwizard_cluster_members < 3
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Cluster PeSIT Wizard dégradé"

Logs

Format des logs

2025-01-10 10:30:00.123 INFO  [pesitwizard-server] [session-123] CONNECT partner=CLIENT_XYZ ip=192.168.1.100
2025-01-10 10:30:01.456 INFO  [pesitwizard-server] [session-123] CREATE file=VIREMENT.XML virtualFile=VIREMENTS
2025-01-10 10:30:05.789 INFO  [pesitwizard-server] [session-123] TRANSFER_COMPLETE bytes=15234 duration=4333ms

Centralisation avec ELK

yaml
# filebeat.yml
filebeat.inputs:
  - type: container
    paths:
      - /var/log/containers/pesitwizard-server-*.log
    processors:
      - add_kubernetes_metadata: ~

output.elasticsearch:
  hosts: ["elasticsearch:9200"]
  index: "pesitwizard-%{+yyyy.MM.dd}"

Requêtes Kibana utiles

# Erreurs des dernières 24h
level:ERROR AND kubernetes.labels.app:pesitwizard-server

# Transferts d'un partenaire
message:"TRANSFER_COMPLETE" AND partner:CLIENT_XYZ

# Connexions échouées
message:"CONNECT" AND status:FAILED

Health checks

Endpoints

EndpointDescription
/actuator/healthSanté globale
/actuator/health/readinessPrêt à recevoir du trafic
/actuator/health/livenessApplication en vie

Kubernetes probes

yaml
readinessProbe:
  httpGet:
    path: /actuator/health/readiness
    port: 8080
  initialDelaySeconds: 30
  periodSeconds: 10

livenessProbe:
  httpGet:
    path: /actuator/health/liveness
    port: 8080
  initialDelaySeconds: 60
  periodSeconds: 30

Alerting

Email

Configurez les alertes email dans l'application :

yaml
pesitwizard:
  alerting:
    email:
      enabled: true
      smtp-host: smtp.example.com
      from: pesitwizard@example.com
      to: ops@example.com
    triggers:
      - type: TRANSFER_FAILED
      - type: CONNECTION_FAILED
      - type: CLUSTER_DEGRADED

Webhook

yaml
pesitwizard:
  alerting:
    webhook:
      enabled: true
      url: https://hooks.slack.com/services/xxx
      events:
        - TRANSFER_FAILED
        - CLUSTER_LEADER_CHANGED

PeSIT Wizard Enterprise - Console d'administration