-
Notifications
You must be signed in to change notification settings - Fork 534
OCPEDGE-2280: Add Adaptable Topology, reorganize topology enhancements #1905
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
@jaypoulz: This pull request explicitly references no jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
3aa08a3 to
40d4ff0
Compare
|
/retitle OCPEDGE-2280: Add Adaptable Topology, reorganize topology enhancements |
|
@jaypoulz: This pull request references OCPEDGE-2280 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the epic to target either version "4.21." or "openshift-4.21.", but it targets "openshift-4.22" instead. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
/retest |
Introduce Adaptable topology, a cluster-topology mode that adjusts control-plane and infrastructure behavior based on node count. Clusters can install with Adaptable topology or transition from SingleReplica as a Day 2 operation. Key components: - Infrastructure API adds Adaptable enum value - Operator compatibility annotations for in-payload and OLM operators - Library-go utilities for node count awareness - cluster-etcd-operator enhanced scaling logic for 2↔3 nodes - oc adm topology transition CLI command with compatibility checks - Console marketplace filtering and compatibility display - Initial Topology Audit covering 39 operators The initial implementation supports SingleReplica-to-Adaptable transitions. Future stages will add AutomaticQuorumRecovery (AQR) for DualReplica behavior on two-node configurations. Reviewers assigned across 18 teams including control plane, OLM, API, installer, console, monitoring, and core operator teams. This also reorganizes edge topology enhancements under a new edge-topologies/ directory: - edge-topologies/adaptable-topology.md (new) - edge-topologies/single-node/ (moved) - edge-topologies/two-node/ (moved and reorganized) - two-node-fencing.md (renamed from tnf.md) - two-node-arbiter.md (renamed from arbiter-clusters.md)
40d4ff0 to
6e81004
Compare
|
@jaypoulz: The following test failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
| Adaptable topology is enabled | ||
| - Provide a subscription function for topology changes that operators invoke | ||
| to enable Adaptable topology behavior | ||
| - Subscribe to node count changes and provide notifications when |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like a more natural way of triggering operators to make a change relevant to cluster changes might be to utilize the controller runtime events library for notifications (and as a result allowing operators to reconcile the changes)
We might be able to utilize the GenericEvent but I think ideally it would make sense to create a TopologyTransitionEvent type specific for these transitions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think having events to log changes in the cluster is reasonable, but I think that operators responding to node count changes is more direct and more in line with the vision of Adaptable topology. We want to avoid give the impression that AdaptableTopology is transitioning between topologies. It's behavior is define by the number of CP nodes in the cluster, and it expects each operator to know what to do no matter how many CP nodes there are.
| 1. The cluster administrator adds a new control-plane node to the cluster | ||
| 2. The node joins the cluster and receives the control-plane labels | ||
| 3. Operators watching the Infrastructure config detect the node count change | ||
| 4. When crossing the 2→3 control-plane node threshold: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also support >3 control plane nodes, e.g. 4 or 5 etcd members. I think this etcd scaling process needs to be done then, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we allow 1, 2, 3, and 5. Etcd scaling beyond from 3-4 and 4-5 are technically equivalent to 1>2 because it's just a adding a node. This an implementation detail that does need to be decided upon, because this would either be the first time we explicitly allow non-transient 4 node control planes or we'd use the double-scaleup behavior of 2->3 to go from 3->5 etcd members.
| 4. When crossing the 2→3 control-plane node threshold: | ||
| - CEO initiates the etcd scaling process, starting learner instances on the two control-plane nodes not running etcd | ||
| - CEO coordinates an atomic transition to promote both learner instances together | ||
| - Other operators adjust their behavior to match HighlyAvailable control-plane topology |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would that happen concurrently with CEO, or after? As CEO / etcd scaling is high risk operation, I would feel better to first do that, and then somehow trigger the other operators.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Concurrently with CEO. We already do this with TNF every time we do a graceful or non-graceful disruption event, and it's up to the operators to define a dependency on etcd if it is present. The only operator with such a dependency is apiserver, which responds to CEO changes as well as CP node changes.
| - CEO initiates the etcd scaling process, starting learner instances on the two control-plane nodes not running etcd | ||
| - CEO coordinates an atomic transition to promote both learner instances together | ||
| - Other operators adjust their behavior to match HighlyAvailable control-plane topology | ||
| 5. When scaling down and crossing the 3→2 control-plane node threshold: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is triggering this scaling down? Is it a simple "oc delete node/cp-1"? How then ensure atomicity?
I think we need an "oc adm topology" command to do that.
We also need to cross check with the procedure we have in place to replace a failed control plan node. If that involves first removing the node, we probably do not want to trigger the etcd
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can delete nodes now, and this modifies behavior for the cluster.
For UPI clusters, you won't get a replacement node - regardless of the topology set.
While I agree that we will need additional atomicity coordination for scale-up and scale-down across threshold levels (i.e. 2->3 or 3->2) to ensure coordination at the etcd level, but my proposal is that CEO handles that, since it already has safe-scaling protection built in.
I don't know that a separate oc adm topology command should be for scale-up or down. It can provide insight into your effective topology, but I'd like to avoid introducing new commands that perform behavior that is already part of the product.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think what you're getting at is that a user who is accustomed to our current topologies might say - but how do tell the cluster how I want it to behave with regard to topology? That paradigm is strictly incompatible with AdaptableTopology. You don't get to tell the cluster how to behave - you just control the number of CP nodes, and the cluster operators are responsible for reconciling that to appropriate behaviors.
| This prevents invalid configurations where control-plane and infrastructure | ||
| topologies are out of sync. | ||
|
|
||
| #### Shared Utilities in library-go |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This still feels like an awkward way of getting and watch the status. Why not have a new attribute "effectiveTopology" which can change and shows the current topology?
Also, for cluster admins, its not easy to see what the current effective topolog is, e.g. during the control plane scaling process. We need to have a status field that shows e.g. "etcd cluster extension in progress, adding learners,....."
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an implementation detail. We can have library-go functions for effective topology, but that's more for monitoring / user-facing behavior. For operator authors, it's better to write these in reference to what behavior is appropriate for each node-count, since topologies are just an abstraction layer on top of that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you say that 2 node topology is not two node topology because etcd is in progress scaling up from 1 to 2 nodes?
I would say that it's two-node topology with cluster-etcd-operator in "In Progress" state.
AdaptableTopology enacts that paradigm. Number of control-plane nodes (+ a potential future extension for fencing) determines your effective topology. If your cluster-admin wants to see the state of the cluster adapting to a node addtion - they just look at the cluster operators the same way they've always done.
|
|
||
| **Topology Mismatch**: The cluster configuration would claim HighlyAvailable | ||
| while still operating as SingleReplica until nodes are provisioned. | ||
| The declared topology differs from actual behavior. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adaptable will have this intermediate state, too. E.g. during etcd learner phase when transitioning from SNO to MNO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The point is that it's not a mis-match. It's expected behavior for Adatable topology. Whereas it's an invalid state for HighlyAvailable outside of the installation phase.
| The declared topology differs from actual behavior. | ||
|
|
||
| **Complex Support Matrix**: Supporting arbitrary topology transitions creates | ||
| a complex matrix. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But does not Adaptable also allow arbitrary tranistions, by adding/removing nodes? This will create a complex matrix, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this is complex as you'd imagine.
There are exactly 6 scenarios, which could be combined into a single lane for verification:
- 1->2
- 2->3
- 3->4
- 4->3
- 3->2
- 2->1
Everything other transition or combination is technically equivalent to some combination of these atomic transitions.
|
|
||
| The one-way transition model simplifies testing. | ||
| We only test transitions into Adaptable topology and can stagger support | ||
| for different source topologies. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but we also need to test scaleout/scaledown with adaptable, which creates tons of possible paths we need to validate. Or not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is somewhat true. This bullet refers to testing the topology transtion, not testing the functionality of the topology definition.
Adaptable topology will need scaleout / scale-in testing, but those could be dedicated lanes, and I think there is less there to test than you might think.
1 to 2 and 2 to 3 have nuances. but 3 to 4 and beyond behave the same for any number of nodes.
scaledown is the same in reverse.
|
Inactive enhancement proposals go stale after 28d of inactivity. See https://github.com/openshift/enhancements#life-cycle for details. Mark the proposal as fresh by commenting If this proposal is safe to close now please do so with /lifecycle stale |
|
Stale enhancement proposals rot after 7d of inactivity. See https://github.com/openshift/enhancements#life-cycle for details. Mark the proposal as fresh by commenting If this proposal is safe to close now please do so with /lifecycle rotten |
|
PR needs rebase. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
|
/remove lifecycle-rotten |
|
/remove-lifecycle rotten |
This enhancement introduces Adaptable topology, a new cluster-topology mode that enables clusters to dynamically adjust their behavior based on node count. This allows SingleReplica clusters to scale to multi-node configurations without redeployment.
Key features:
The proposal includes complete workflow descriptions, API extensions, test plans, and version skew strategy. Future stages will add AutomaticQuorumRecovery (AQR) to enable DualReplica-based resiliency for two-node configurations.