-
Notifications
You must be signed in to change notification settings - Fork 534
CNTRLPLANE-2640: Add HyperShift private CAPI types enhancement #1927
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
Skipping CI for Draft Pull Request. |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
We should call out all the CAPI platforms we support: CAPA, CAPZ, CAPV, CAP-Agent, CAPG, etc. |
|
@csrwng: This pull request references OCPSTRAT-2789 which is a valid jira issue. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
@csrwng: This pull request references CNTRLPLANE-2640 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the epic to target the "4.22.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
This enhancement proposes isolating HyperShift's Cluster API (CAPI) CRDs from those installed by the OpenShift platform on management clusters. As OpenShift evolves toward using CAPI for standalone cluster machine management, a conflict emerges: both the platform and HyperShift need to install CAPI CRDs on the same management cluster, potentially with incompatible versions. The proposal introduces two major components: 1. Private CAPI Types and API Proxy: HyperShift-specific CAPI CRDs using the cluster.hypershift.openshift.io group, with an API proxy sidecar to transparently translate between standard and private CAPI types. 2. Automatic Migration: A migration controller that automatically converts existing hosted clusters from standard to private CAPI types without disrupting operations. This enables independent version management for both HyperShift and platform CAPI dependencies while maintaining transparent operation for hosted cluster administrators and workloads.
|
@csrwng: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
|
|
||
| d. **Workload Update and Resume**: | ||
| - Update CAPI-dependent workload deployments to include the API proxy sidecar | ||
| - For deployments managed by the Control Plane Operator, update the CPO deployment to signal it should add proxy sidecars |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this imply a backport to the minimum supported hc version, so autoscaler/machine-approver get their spec updated with the proxy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I saw that's clarified below
| Historically, HyperShift management clusters did not use Cluster API (CAPI) for their own machine management, relying instead on the OpenShift Machine API. This allowed HyperShift to install and manage its own version of CAPI CRDs, effectively owning the CAPI types on the management cluster. | ||
|
|
||
| With OpenShift's evolution toward using CAPI for standalone cluster machine management, a critical conflict emerges: both the platform and HyperShift will need to install CAPI CRDs on the same management cluster. If these CRD versions are incompatible, neither the platform nor HyperShift can function correctly. | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might be worth mentioning MCE which is the delivery mechanism for self hosted hcp has also a desire to handle their cluster.x-k8s.io CRDs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also standalone has a toggle to don't clobber this CRDs
|
|
||
| This enhancement introduces new CRDs that mirror the standard CAPI CRDs but use the `cluster.hypershift.openshift.io` API group: | ||
|
|
||
| - `Cluster.cluster.hypershift.openshift.io` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also MHC and other CRDs the controllers require to work
|
|
||
| **dual operator architecture** consists of two HyperShift operator instances running simultaneously: one supporting private CAPI types (new) and one using standard CAPI types (legacy). | ||
|
|
||
| 1. The platform administrator upgrades their HyperShift operator to a version that supports the `--private-capi-types` flag and runs `hypershift install --private-capi-types`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
having this as a flag would require update the delivery mechanisms for managed and selfhosted. Does it really need to be? when would you opt-out?
| | `hypershift.openshift.io/private-capi-types: "true"` | New Operator | Successfully migrated clusters | | ||
| | `hypershift.openshift.io/scope: "legacy"` | Legacy Operator | Existing clusters awaiting migration | | ||
| | `hypershift.openshift.io/migration-in-progress: "true"` | Migration Controller | Clusters actively being migrated (neither operator reconciles) | | ||
| | `hypershift.openshift.io/migration-failed` | Legacy Operator | Previous migration failed; requires SRE remediation before retry | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this whole process is more sensitive for self hosted in which case this all would need to be documented and would impact user directly with additional burden.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also even though is not recommended, there's users who might be consuming Machine CRs directly specially in baremetal scenarios to .e.g. annotate next one for deletion on scale down. It would be good to collect some feedback.
| * Isolate HyperShift's CAPI CRD dependencies from the platform's CAPI CRDs by using a distinct API group (`cluster.hypershift.openshift.io`). | ||
| * Enable HyperShift components to continue using standard CAPI client libraries without modification through a transparent API proxy. | ||
| * Automatically migrate existing HyperShift installations to use the private CAPI types without user intervention or hosted cluster downtime. | ||
| * Ensure zero user-facing impact - hosted cluster administrators and workloads should experience no behavioral changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should probably call out there's a period where ability to operate would be degraded e.g. ability to scale dataplane while the controllers are scaled down
|
|
||
| ### Alternative 1: Coordinate CAPI Versions Between Platform and HyperShift | ||
|
|
||
| Instead of isolating CAPI types, ensure that the platform and HyperShift always use compatible CAPI versions through tight coordination. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Coupling from standalone management clusters would be solved by the flag they provide to not clobber the CRDs. Leaving the only possible conflict with MCE. Each MCE version bundles a pinned version of hypershift. What would prevent MCE and HO from running with the same latest capi APIs release for each downstream cycle?
This enhancement proposes isolating HyperShift's Cluster API (CAPI)
CRDs from those installed by the OpenShift platform on management
clusters. As OpenShift evolves toward using CAPI for standalone
cluster machine management, a conflict emerges: both the platform
and HyperShift need to install CAPI CRDs on the same management
cluster, potentially with incompatible versions.
The proposal introduces two major components:
Private CAPI Types and API Proxy: HyperShift-specific CAPI CRDs
using the cluster.hypershift.openshift.io group, with an API proxy
sidecar to transparently translate between standard and private
CAPI types.
Automatic Migration: A migration controller that automatically
converts existing hosted clusters from standard to private CAPI
types without disrupting operations.
This enables independent version management for both HyperShift and
platform CAPI dependencies while maintaining transparent operation
for hosted cluster administrators and workloads.