Clustering and High Availability
Architecture
PeSIT Wizard server supports clustering for high availability:
┌─────────────────────────────────────────────────────────┐
│ LoadBalancer │
│ (selector: pesitwizard-leader=true) │
└─────────────────────┬───────────────────────────────────┘
│
┌─────────────┼─────────────┐
│ │ │
▼ ▼ ▼
┌─────────┐ ┌─────────┐ ┌─────────┐
│ Pod 1 │ │ Pod 2 │ │ Pod 3 │
│ LEADER │ │ Standby │ │ Standby │
│ ✓ │ │ │ │ │
└─────────┘ └─────────┘ └─────────┘
│ │ │
└─────────────┼─────────────┘
│
▼
┌──────────────┐
│ PostgreSQL │
│ (shared) │
└──────────────┘How It Works
Leader Election
- Uses JGroups for discovery and leader election
- The first pod to join the cluster becomes the leader
- If the leader is lost, a new one is automatically elected
Kubernetes Labeling
The leader pod is automatically labeled pesitwizard-leader=true:
bash
# View the current leader
kubectl get pods -l pesitwizard-leader=true
# View all pods with their labels
kubectl get pods --show-labelsTraffic Routing
The Kubernetes Service uses a selector to route traffic only to the leader:
yaml
spec:
selector:
app: pesitwizard-server
pesitwizard-leader: "true"Configuration
Enable Clustering
yaml
pesitwizard:
cluster:
enabled: true
name: pesitwizard-clusterRequired Environment Variables
yaml
env:
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: PESIT_CLUSTER_ENABLED
value: "true"Required RBAC
The ServiceAccount must be able to modify pod labels:
yaml
rules:
- apiGroups: [""]
resources: ["pods"]
verbs: ["get", "list", "patch"]Failover Behavior
Scenario: Leader Goes Down
- JGroups detects the leader loss (timeout ~10s)
- A new leader is elected among the remaining pods
- The new leader:
- Adds the label
pesitwizard-leader=trueto its pod - Starts the configured PeSIT servers
- Adds the label
- The LoadBalancer routes to the new leader
- In-flight connections are lost (clients must reconnect)
Scenario: Standby Pod Goes Down
- JGroups detects the member loss
- The cluster continues operating normally
- Kubernetes recreates the pod automatically
- The new pod joins the cluster in standby mode
Cluster Monitoring
Status API
bash
curl http://localhost:8080/api/cluster/status -u admin:adminResponse:
json
{
"clusterName": "pesitwizard-cluster",
"isLeader": true,
"members": [
{
"name": "pesitwizard-server-abc123",
"address": "10.42.0.100",
"isLeader": true
},
{
"name": "pesitwizard-server-def456",
"address": "10.42.0.101",
"isLeader": false
},
{
"name": "pesitwizard-server-ghi789",
"address": "10.42.0.102",
"isLeader": false
}
],
"memberCount": 3
}Clustering Logs
bash
# View leader logs
kubectl logs -l pesitwizard-leader=true -f
# Filter JGroups logs
kubectl logs -l app=pesitwizard-server | grep -i "cluster\|leader\|jgroups"Metrics
pesitwizard_cluster_members: Number of cluster memberspesitwizard_cluster_is_leader: 1 if this pod is leader, 0 otherwise
Best Practices
Number of Replicas
| Environment | Replicas | Rationale |
|---|---|---|
| Dev/Test | 1 | No HA needed |
| Staging | 2 | Failover testing |
| Production | 3 | Tolerates 1 failure |
| Critical Production | 5 | Tolerates 2 failures |
Anti-affinity
Spread pods across different nodes:
yaml
spec:
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchLabels:
app: pesitwizard-server
topologyKey: kubernetes.io/hostnamePodDisruptionBudget
Ensure a minimum number of available pods:
yaml
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: pesitwizard-server-pdb
spec:
minAvailable: 2
selector:
matchLabels:
app: pesitwizard-server