Pipeline update causes race condition - pipeline removed from envoy despite Ready status

### Describe the bug

When updating a Pipeline CRD, the scheduler incorrectly removes the pipeline from envoy routing after the old pipeline version is deleted, even though the new version was successfully loaded. This leaves the pipeline in a broken state where:
- Pipeline status shows `Ready: True`
- Actual requests return `503 no healthy upstream`

### Environment

- **Seldon Core version**: 2.10.2
- **Kubernetes version**: 1.28
- **Installation method**: Helm
- **Kafka**: Tested with both local Strimzi and Confluent Cloud (same behavior)

### To Reproduce

1. Deploy a working pipeline
2. Verify pipeline is functional (returns 200/400, not 503)
3. Apply an update to the pipeline spec (e.g., change `stepsJoin: inner` to `stepsJoin: outer`)
4. Observe pipeline gateway logs

### Expected behavior

Pipeline should remain routable after the update. The old version should be deleted without affecting the new version's routing.

### Actual behavior

The pipeline becomes unroutable (503 errors) despite showing `Ready: True` in the CRD status.

### Logs

**Scheduler logs showing the issue:**

```
time="2026-01-06T16:14:47Z" level=info msg="Received pipeline status event update:{op:Create  pipeline:\"mlserver-example-pipeline\"  version:9  ...}  success:true  reason:\"Pipeline mlserver-example-pipeline loaded\""
time="2026-01-06T16:14:47Z" level=info msg="Pipeline mlserver-example-pipeline status counts: 1/1 ready"
time="2026-01-06T16:14:47Z" level=info msg="Adding normal pipeline route mlserver-example-pipeline"
time="2026-01-06T16:14:48Z" level=info msg="Pipeline mlserver-example-pipeline status counts: 1/1 terminated"
time="2026-01-06T16:14:49Z" level=info msg="Received pipeline status event update:{op:Delete  pipeline:\"mlserver-example-pipeline\"  version:8  ...}  success:true  reason:\"Pipeline mlserver-example-pipeline deleted\""
time="2026-01-06T16:14:49Z" level=info msg="Pipeline mlserver-example-pipeline has been terminated, removing from conflict resolution and envoy"
```

**Pipeline gateway logs:**

```
time="2026-01-06T15:51:16Z" level=info msg="Pipeline mlserver-example-pipeline loaded"
time="2026-01-06T15:51:17Z" level=info msg="Deleted pipeline mlserver-example-pipeline"
time="2026-01-06T15:51:17Z" level=info msg="Pipeline mlserver-example-pipeline deleted"
```

**Key observation:** Version 9 is created and loaded successfully, but when version 8 is deleted, the scheduler's `GetPipelineStatus` function reports `1/1 terminated` and removes the pipeline from envoy - even though version 9 is still active.

### Root cause analysis

The bug appears to be in the scheduler's `dataflow-conflict-resolution` component. When processing the delete event for the old pipeline version, `GetPipelineStatus` incorrectly counts the pipeline as terminated and triggers removal from envoy, ignoring that a newer version is still loaded.

The sequence is:
1. Pipeline v9 created → "1/1 ready" → added to envoy ✓
2. Pipeline v8 delete event received
3. `GetPipelineStatus` returns "1/1 terminated" (BUG: should still show ready because v9 exists)
4. Pipeline removed from envoy (BUG: v9 is still valid)

### Workaround

Restarting the pipeline gateway pod after any pipeline update resolves the issue:

```bash
kubectl rollout restart deployment/seldon-pipelinegateway -n seldon-mesh
```

This works because the fresh pod connects to the scheduler and loads the current pipeline version without any "old version" delete events to process.

### Additional context

- This is 100% reproducible on every pipeline spec update
- Scaling pipeline gateway to multiple replicas does NOT help - all replicas experience the race condition simultaneously
- Initial pipeline creation works fine; only updates trigger the bug
- The bug was present with Strimzi Kafka and persists with Confluent Cloud, ruling out Kafka-specific issues
- PR #6849 addressed related pipeline loading/unloading issues in v2.10.0, but this race condition persists in v2.10.2

### Impact

- Requires manual intervention (pod restart) after every pipeline update
- Makes pipelines unsafe for production CI/CD without workarounds
- CRD status is misleading (shows Ready when routing is broken)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pipeline update causes race condition - pipeline removed from envoy despite Ready status #7072

Describe the bug

Environment

To Reproduce

Expected behavior

Actual behavior

Logs

Root cause analysis

Workaround

Additional context

Impact

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Pipeline update causes race condition - pipeline removed from envoy despite Ready status #7072

Description

Describe the bug

Environment

To Reproduce

Expected behavior

Actual behavior

Logs

Root cause analysis

Workaround

Additional context

Impact

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions