-
Notifications
You must be signed in to change notification settings - Fork 192
e2e cleanup of orphaned service principals and identities #4557
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
shubhadapaithankar
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
mociarain
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some thoughts and questions but overall it looks great. This will be a nice change
|
This LGTM. Just wondering if you tested it with the dry-run mode to see if there were any edge cases that appeared? |
|
@mociarain I tested the changes in dryRun mode: https://msazure.visualstudio.com/AzureRedHatOpenShift/_build/results?buildId=149649712&view=logs&j=a4f1910f-c367-5697-edcd-724d81cde23b&t=bc103e47-9b88-5c56-fcad-f3f2f0b2ed91 |
kimorris27
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took a look at the pipeline run, and I thought it was interesting how there were exactly 100 mock-msi SPs. I think there are more than 100 and we're only getting 100 because the list response is paginated, and we're not checking the other pages.
Some more info here: https://learn.microsoft.com/en-us/graph/paging?tabs=http
TBH I'm not sure if going through all of the pages is worth the trouble. Since the clean pipeline runs on a schedule, we'll get all of the SPs eventually.
I would be cool merging this without any changes, but I wanted to point this out in case anyone else has other thoughts.
Agreed. Bump it to 48hrs
+1 on this. I think we don't need to change it but can you add a comment to it just outlining that this is the case and we left it this way assuming it will eventually clear them out. |
|
I realized after running the pipeline in dry run mode that using rc.graphClient.Applications().ByApplicationId(objectID).Delete(ctx, nil) would only delete the mock-msi SP and v4-e2e-* SPs, but not Disk Encryption Sets. These need to be deleted using rc.graphClient.ServicePrincipals().ByServicePrincipalId(objectID).Delete(ctx, nil) as no application object exists to delete for DESs. And if we use ServicePrincipals().Delete() for everything, the SPs would get deleted but leave orphaned application registration. |
a4d2cef to
1648f58
Compare
|
I'm still getting an error for deleting disk encryption sets in dry run mode. |
|
For disk encryption sets, Caden suggested deleting the resource groups that contain these DES instead of deleting individual DESs, which makes sense to me. There's already existing logic for cleaning up resourcegroups with |
The existing code does clean up RGs with those prefixes, but it also checks for the I think we'll be fine once your changes merge though, since we won't be checking that tag anymore. |
|
/azp run ci |
|
Azure Pipelines successfully started running 1 pipeline(s). |
Which issue this PR addresses:
Fixes ARO-22824
What this PR does / why we need it:
Managed identities, cluster service principals, and role assignments are cleaned up as part of e2e cluster deletion, but this doesn't occur if the e2e job times out and doesn't make it to the cleanup step.
While running miwi E2E tests in the E2E Build and Release pipeline, we observed that managed identities and cluster service principals are not being cleaned up when the E2E job times out. This PR adds cleanup logic specifically for orphaned resource groups from prod e2e clusters and service principals, by creating a dedicated cleanup job in pkg/util/purge.
Test plan for issue:
Is there any documentation that needs to be updated for this PR?
no