-
Notifications
You must be signed in to change notification settings - Fork 154
Description
Summary
The nats server cluster balance command reaches across the entire supercluster (hub + leafnodes) instead of scoping to the connected cluster. This can unintentionally disrupt connections on clusters the operator did not intend to affect.
Environment
- NATS Server: 2.12.2
- NATS CLI: 0.3.0
- Topology: Multi-cluster supercluster with leafnodes
Steps to Reproduce
- Set up a supercluster with hub and leafnode clusters
- Connect CLI to a leafnode cluster
- Run
nats server cluster balance --trace - Observe that connections are kicked from servers across the entire supercluster, not just the connected cluster
Observed Behavior
When connected to one cluster in a supercluster topology, the balance command kicks connections from servers in other clusters. Example trace output showed kicks distributed across 3 different servers from multiple clusters:
- Server A: 59 kicks
- Server B: 5 kicks
- Server C: 53 kicks
The command uses $SYS.REQ.SERVER.PING.CONNZ which discovers all servers in the supercluster, then proceeds to kick connections from all of them.
Expected Behavior
By default, cluster balance should only affect servers in the cluster the CLI is connected to. Operators should have to explicitly opt-in to supercluster-wide operations.
Proposed Solution
- Change default behavior to only balance the connected cluster
- Add
--superclusterflag to opt-in to current behavior (balance all reachable servers)
Current Workarounds
- Filter by account using
--account(works if accounts are cluster-scoped) - Filter by server name using
--server(tedious and error-prone)
Impact
High - Could accidentally disrupt production traffic on unintended clusters in shared infrastructure environments.