AUTOSCALE-512: enhancement for AutoNodePolicy #1924

maxcao13 · 2026-01-16T23:25:39Z

No description provided.

openshift-ci-robot · 2026-01-16T23:25:43Z

@maxcao13: This pull request references AUTOSCALE-512 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci · 2026-01-16T23:25:53Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign sjenning for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details

Needs approval from an approver in each of these files:

enhancements/hypershift/OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

Signed-off-by: Max Cao <macao@redhat.com>

enxebre · 2026-01-20T09:44:52Z

enhancements/hypershift/autonode-policy.md

+types that are unsupported by the service provider (e.g., ROSA does not support
+`t3.micro`).
+
+Simple approaches like ValidatingAdmissionPolicies or hiding API fields don't


can you elaborate on why validating admission policy is less preferred or why can't it be complementary to the reconciliation approach?

Wouldn't something like this be equivalent to what you achieve in your example with the autoNodePolicy?

object.spec.template.spec.requirements.all(req, (req.key == 'node.kubernetes.io/instance-type' && req.operator == 'In') ? req.values.all(val, !['m5.4xlarge', 'm5.2xlarge'].has(val)) : true )

I assume the policy is a best effort providing the amount of input that is available https://karpenter.sh/docs/reference/instance-types/

As a background for anyone else reading this thread and proposal:

Just looking at your link of instance types and aws-specific well-known labels that would select those instancetypes - there are a ton in there, and my thinking is that it would be technically infeasible to restrict all pathways to have a deny-list during validation. But if we were only to restrict a small subset of paths that disable instancetypes, is that worth putting the code in?

For example, let's say ROSA SRE wants to restrict the m5.4xlarge and m5.2xlarge instance types, and we have cel that tries to restrict users from setting this in their nodepool, what exactly would that look like?

If we have what you describe:

object.spec.template.spec.requirements.all(req, (req.key == 'node.kubernetes.io/instance-type' && req.operator == 'In') ? req.values.all(val, !['m5.4xlarge', 'm5.2xlarge'].has(val)) : true )

Users can get around this by just specifiying this instead:

requirements: - key: "karpenter.k8s.aws/instance-family" operator: In values: ["m5"] - key: "karpenter.k8s.aws/instance-cpu" operator: In values: ["8", "16"]

and if we say, "oh we will just restrict that both those requirements cannot get set at the same time as well", then you can see how complicated this can get.

If we want to only include a simple cel rule like you proposed as complimentary, and just ignore all the other ways a user can bypass it, then it wouldn't be so complicated and we could do it, but I wasn't sure that this best-effort policy would be greatly beneficial to us, so let me know if you think otherwise. To me, I imagined admission checks to be an "all or nothing" sort of thing, and I could make that clearer in the doc.

there are a ton in there, and my thinking is that it would be technically infeasible to restrict all pathways to have a deny-list during validation. But if we were only to restrict a small subset of paths that disable instancetypes, is that worth putting the code in?

How is this not true for the policy CR?

It is not true because the service provider can directly insert their own requirements that will be immutable from the user perspective and cannot be superseded/circumvented by any user-set requirements.

So, for example, if ROSA has a particular region that has disabled m5.4xlarge and m5.2xlarge instance types and wants to make sure all users in the management clusters in those regions are not allowed to spin those up, they can specify in the CR:

requirements: - key: node.kubernetes.io/instance-type operator: NotIn values: ["m5.4xlarge, m5.2xlarge"]

which will reflect in the underlying NodePool.

But the trick here is that if the user tries to set something like this in the OpenshiftNodePool:

- key: node.kubernetes.io/instance-type operator: In values: ["m5.4xlarge, m5.2xlarge"]

The resulting underlying NodePool will look like:

requirements: - key: node.kubernetes.io/instance-type operator: NotIn values: ["m5.4xlarge, m5.2xlarge"] - key: node.kubernetes.io/instance-type operator: In values: ["m5.4xlarge, m5.2xlarge"]

and Karpenter core code for requirements is designed such that all the requirements that are combined in order to select a final instance type, are combined with intersection. In other words, only instance types that appear in all requiremenst will be considered, meaning this requirements spec's intersection, will not select any m5.4xlarge or m5.2xlarge instances. In fact, Karpenter will not be able to provision anything using this NodePool, unless there is another requirement that the user specifies that produce a non-empty intersection of instance types.

enxebre · 2026-01-20T09:45:31Z

enhancements/hypershift/autonode-policy.md

+
+This enhancement is not applicable to standalone OpenShift clusters. AutoNodePolicy
+relies on the management/guest cluster separation that only exists in HyperShift
+environments. In standalone clusters, administrators have full control over


fwiw there's nothing technically preventing the karpenter operator from running in other form factors

Thanks, yeah I glossed over this when writing/clauding it, but on OpenShift standalone (at least with karpenter-provider-aws), we probably wouldn't restrict instance types or anything else at all, so AutoNodePolicy would not be applicable there.

enxebre · 2026-01-20T09:47:28Z

enhancements/hypershift/autonode-policy.md

+1. The Service Provider SRE updates an existing AutoNodePolicy (e.g., adds
+   a new instance type `m6i.xlarge` to the allowlist).
+
+2. The Karpenter Operator detects the policy change and re-reconciles all


This is only relevant for the nodepools which expressed intent for m6i.xlarge in the first place, correct? does the policy clobber the values for the upper level nodepool or only for the lower level one? both approaches have trade offs.

I think the policy should clobber on the lower level nodepool.

If a user-configured set of requirements were to yield 0 instance types, the UX feedback would simply be an Event. This currently exists already in Karpenter itself:

5s Warning NoCompatibleInstanceTypes NodePool/ondemand-and-spot2 NodePool requirements filtered out all compatible available instance types

I don't think the user should explicitly "know" about these overrides as I think it would be unnecessarily mental overhead.The only thing they really need to know is that these instance types are not enabled by the service provider, and that should easily be known from ROSA/OpenShift documentation.

In a perfect world, the underlying NodePool and NodeClass wouldn't be in the guest cluster so the user wouldn't see all the underlying clobbering, but that is currently not the case, and a Karpenter power user will technically be able to see the clobbering there, but that boat sailed already.

enxebre · 2026-01-20T12:41:33Z

enhancements/hypershift/autonode-policy.md

+   immediate rejection.
+
+## Alternatives (Not Implemented)
+


validating admission policy? webhook?

I can add more detail on alternatives that were considered, thanks 👍

maxcao13 · 2026-01-21T18:02:25Z

This enhancement seems to not be viable to the overall ROSA architecture, and instead the ROSA team will implement webhook controllers and VAPs to prevent restricted instance types from being created.

I'll leave it open for now, but it seems like I will close this soon, thanks alberto for the review here 👍

openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jan 16, 2026

openshift-ci bot requested review from derekwaynecarr and sjenning January 16, 2026 23:25

maxcao13 marked this pull request as draft January 16, 2026 23:32

openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jan 16, 2026

maxcao13 force-pushed the autonode-policy branch from 3447f82 to 40dffba Compare January 16, 2026 23:34

AUTOSCALE-512: enhancement for AutoNodePolicy

e842425

Signed-off-by: Max Cao <macao@redhat.com>

maxcao13 force-pushed the autonode-policy branch from 40dffba to e842425 Compare January 17, 2026 02:32

enxebre reviewed Jan 20, 2026

View reviewed changes

enxebre mentioned this pull request Jan 21, 2026

AUTOSCALE-491: OpenshiftNodePool CRD for AutoNode openshift/hypershift#7481

Closed

4 tasks

AUTOSCALE-512: enhancement for AutoNodePolicy #1924

Are you sure you want to change the base?

AUTOSCALE-512: enhancement for AutoNodePolicy #1924

Uh oh!

Conversation

maxcao13 commented Jan 16, 2026

Uh oh!

openshift-ci-robot commented Jan 16, 2026 • edited by openshift-ci bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

openshift-ci bot commented Jan 16, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

maxcao13 Jan 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

maxcao13 commented Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

openshift-ci-robot commented Jan 16, 2026 •

edited by openshift-ci bot

Loading

maxcao13 Jan 20, 2026 •

edited

Loading

maxcao13 commented Jan 21, 2026 •

edited

Loading