Clustering and High Availability

Architecture

PeSIT Wizard server supports clustering for high availability:

┌─────────────────────────────────────────────────────────┐
│                   LoadBalancer                          │
│              (selector: pesitwizard-leader=true)        │
└─────────────────────┬───────────────────────────────────┘
                      │
        ┌─────────────┼─────────────┐
        │             │             │
        ▼             ▼             ▼
   ┌─────────┐   ┌─────────┐   ┌─────────┐
   │  Pod 1  │   │  Pod 2  │   │  Pod 3  │
   │ LEADER  │   │ Standby │   │ Standby │
   │ ✓       │   │         │   │         │
   └─────────┘   └─────────┘   └─────────┘
        │             │             │
        └─────────────┼─────────────┘
                      │
                      ▼
              ┌──────────────┐
              │  PostgreSQL  │
              │  (shared)    │
              └──────────────┘

How It Works

Leader Election

Uses JGroups for discovery and leader election
The first pod to join the cluster becomes the leader
If the leader is lost, a new one is automatically elected

Kubernetes Labeling

The leader pod is automatically labeled pesitwizard-leader=true:

bash

# View the current leader
kubectl get pods -l pesitwizard-leader=true

# View all pods with their labels
kubectl get pods --show-labels

Traffic Routing

The Kubernetes Service uses a selector to route traffic only to the leader:

yaml

spec:
  selector:
    app: pesitwizard-server
    pesitwizard-leader: "true"

Configuration

Enable Clustering

yaml

pesitwizard:
  cluster:
    enabled: true
    name: pesitwizard-cluster

Required Environment Variables

yaml

env:
- name: POD_NAME
  valueFrom:
    fieldRef:
      fieldPath: metadata.name
- name: POD_NAMESPACE
  valueFrom:
    fieldRef:
      fieldPath: metadata.namespace
- name: PESIT_CLUSTER_ENABLED
  value: "true"

Required RBAC

The ServiceAccount must be able to modify pod labels:

yaml

rules:
- apiGroups: [""]
  resources: ["pods"]
  verbs: ["get", "list", "patch"]

Failover Behavior

Scenario: Leader Goes Down

JGroups detects the leader loss (timeout ~10s)
A new leader is elected among the remaining pods
The new leader:
- Adds the label pesitwizard-leader=true to its pod
- Starts the configured PeSIT servers
The LoadBalancer routes to the new leader
In-flight connections are lost (clients must reconnect)

Scenario: Standby Pod Goes Down

JGroups detects the member loss
The cluster continues operating normally
Kubernetes recreates the pod automatically
The new pod joins the cluster in standby mode

Cluster Monitoring

Status API

bash

curl http://localhost:8080/api/cluster/status -u admin:admin

Response:

json

{
  "clusterName": "pesitwizard-cluster",
  "isLeader": true,
  "members": [
    {
      "name": "pesitwizard-server-abc123",
      "address": "10.42.0.100",
      "isLeader": true
    },
    {
      "name": "pesitwizard-server-def456",
      "address": "10.42.0.101",
      "isLeader": false
    },
    {
      "name": "pesitwizard-server-ghi789",
      "address": "10.42.0.102",
      "isLeader": false
    }
  ],
  "memberCount": 3
}

Clustering Logs

bash

# View leader logs
kubectl logs -l pesitwizard-leader=true -f

# Filter JGroups logs
kubectl logs -l app=pesitwizard-server | grep -i "cluster\|leader\|jgroups"

Metrics

pesitwizard_cluster_members: Number of cluster members
pesitwizard_cluster_is_leader: 1 if this pod is leader, 0 otherwise

Best Practices

Number of Replicas

Environment	Replicas	Rationale
Dev/Test	1	No HA needed
Staging	2	Failover testing
Production	3	Tolerates 1 failure
Critical Production	5	Tolerates 2 failures

Anti-affinity

Spread pods across different nodes:

yaml

spec:
  affinity:
    podAntiAffinity:
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 100
        podAffinityTerm:
          labelSelector:
            matchLabels:
              app: pesitwizard-server
          topologyKey: kubernetes.io/hostname

PodDisruptionBudget

Ensure a minimum number of available pods:

yaml

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: pesitwizard-server-pdb
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app: pesitwizard-server

Clustering and High Availability ​

Architecture ​

How It Works ​

Leader Election ​

Kubernetes Labeling ​

Traffic Routing ​

Configuration ​

Enable Clustering ​

Required Environment Variables ​

Required RBAC ​

Failover Behavior ​

Scenario: Leader Goes Down ​

Scenario: Standby Pod Goes Down ​

Cluster Monitoring ​

Status API ​

Clustering Logs ​

Metrics ​

Best Practices ​

Number of Replicas ​

Anti-affinity ​

PodDisruptionBudget ​

Clustering and High Availability

Architecture

How It Works

Leader Election

Kubernetes Labeling

Traffic Routing

Configuration

Enable Clustering

Required Environment Variables

Required RBAC

Failover Behavior

Scenario: Leader Goes Down

Scenario: Standby Pod Goes Down

Cluster Monitoring

Status API

Clustering Logs

Metrics

Best Practices

Number of Replicas

Anti-affinity

PodDisruptionBudget