Skip to content

[cleanup]: remove deprecated liveness driver and related metrics artifacts#6003

Open
Coderxrohan wants to merge 3 commits intoceph:develfrom
Coderxrohan:cleanup-liveness-driver
Open

[cleanup]: remove deprecated liveness driver and related metrics artifacts#6003
Coderxrohan wants to merge 3 commits intoceph:develfrom
Coderxrohan:cleanup-liveness-driver

Conversation

@Coderxrohan
Copy link

@Coderxrohan Coderxrohan commented Jan 31, 2026

Issue:
Ceph-CSI includes a legacy liveness driver and related deployment artifacts (sidecar containers, Services, ServiceMonitors, and documentation) that are no longer needed. Kubernetes now provides native liveness and health probing, making the custom liveness driver redundant.

Fix:

  1. Removed all references to the deprecated liveness driver across the repository:
  2. Deleted liveness sidecar containers from RBD, CephFS, and NFS manifests
  3. Removed liveness-related Services and ServiceMonitors
  4. Cleaned up documentation sections describing liveness metrics and endpoints
  5. Ensured generated YAMLs no longer expose unused metrics ports

Impact:

  • Simplifies CSI deployments by relying on Kubernetes-native health checks
  • Avoids exposing unused ports and services
  • Reduces maintenance burden and configuration complexity
  • Aligns Ceph-CSI with current Kubernetes best practices

Level:
Medium-severity cleanup / technical debt reduction

Fixes: #5599

@Coderxrohan Coderxrohan force-pushed the cleanup-liveness-driver branch from e4f0c49 to aa19fd2 Compare January 31, 2026 19:48
Copy link
Collaborator

@Madhu-1 Madhu-1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Coderxrohan Thanks for the work, did you get a chance to verify that profing endpoint works if we remove it? what about the code in the liveness.go?

@Madhu-1
Copy link
Collaborator

Madhu-1 commented Feb 2, 2026

This is kind of a breaking change, we need to add it to Pending release notes. cc @Rakshith-R @nixpanic

@Coderxrohan Coderxrohan force-pushed the cleanup-liveness-driver branch from 1234a6d to f44ce82 Compare February 2, 2026 09:12
@Coderxrohan
Copy link
Author

This is kind of a breaking change, we need to add it to Pending release notes. cc @Rakshith-R @nixpanic

@Madhu-1,
Thanks for the review.
I verified that the profiling endpoint continues to work after this change, as it is independent of the deprecated CSI liveness driver and not wired through liveness.go. The removed code in liveness.go was specific to the deprecated driver path and had no impact on profiling or other endpoints.

Agree that this is a breaking change,
I’ve added an entry to the Pending Release Notes accordingly.

@Coderxrohan
Copy link
Author

@Rakshith-R
@nixpanic

I verified locally that profiling is unaffected and that liveness.go was only tied to the deprecated driver.
Happy to get a final confirmation via CI or further review if needed.
Added a breaking-change note to Pending Release Notes.

| `--pidlimit` | _0_ | Configure the PID limit in cgroups. The container runtime can restrict the number of processes/tasks which can cause problems while provisioning (or deleting) a large number of volumes. A value of `-1` configures the limit to the maximum, `0` does not configure limits at all. |
| `--metricsport` | `8080` | TCP port for liveness metrics requests |
| `--metricspath` | `"/metrics"` | Path of prometheus endpoint where metrics will be available |
| `--polltime` | `"60s"` | Time interval in between each poll |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can remove polltime & timeout from here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, this will work. I have removed them. Thanks!

| `--pluginpath` | "/var/lib/kubelet/plugins/" | The location of cephcsi plugin on host |
| `--pidlimit` | _0_ | Configure the PID limit in cgroups. The container runtime can restrict the number of processes/tasks which can cause problems while provisioning (or deleting) a large number of volumes. A value of `-1` configures the limit to the maximum, `0` does not configure limits at all. |
| `--metricsport` | `8080` | TCP port for liveness metrics requests |
| `--metricspath` | `/metrics` | Path of prometheus endpoint where metrics will be available |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why metricspath is removed here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out.
polltime and timeout were tied to the deprecated liveness driver, so they can be removed from docs as well,
I’ll update that.
metricspath was unintentionally dropped in cephfs; I’ll restore it for consistency with RBD.

@Coderxrohan Coderxrohan force-pushed the cleanup-liveness-driver branch from e646090 to e2871c4 Compare February 4, 2026 19:34
@Coderxrohan
Copy link
Author

@iPraveenParihar
Both docs changes (RBD + CephFS) are now applied, and DCO has been fixed.
All reviewer feedback has been addressed.
Requesting reviews/approvals when convenient.
Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Remove liveness driver type from the ceph-csi

3 participants