Skip to content

Conversation

@adarsh0728
Copy link
Member

@adarsh0728 adarsh0728 commented Jan 23, 2026

What this PR does / why we need it

This is for both Pipeline and MonoVertex. The idea is that we want to improve upon our existing AnalysisTemplates for Numaplane assessment.

The problem with our current assessments is they simply look for any message to get acked. Let's say the first message gets acked, but the second message fails to get acked. Numaplane will call this a success.

If we can have a metric which is able to detect failures like EOT, udf crashes etc, then if any of our new Pipeline Vertices or our new Monovertex has a count > 1, we could fail.

Specifically, the idea is to emit a metric from the numa container whenever there is a critical errors in the numaflow pipeline/monovertex.

Pipeline Metric: forwarder_critical_error_total with labels vertex, pipeline, vertex_type, replica and reason
MonoVertex Metric: mvtx_critical_error_total with labels mvtx_name, replica and reason

Testing

Screenshot 2026-01-29 at 3 49 24 PM

@codecov
Copy link

codecov bot commented Jan 23, 2026

Codecov Report

❌ Patch coverage is 17.70833% with 79 lines in your changes missing coverage. Please review.
✅ Project coverage is 80.12%. Comparing base (58f8349) to head (291c17f).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
rust/numaflow-core/src/mapper/map/batch.rs 0.00% 17 Missing ⚠️
rust/numaflow-core/src/sinker/sink/user_defined.rs 0.00% 17 Missing ⚠️
rust/numaflow-core/src/transformer.rs 0.00% 17 Missing ⚠️
rust/numaflow-core/src/lib.rs 0.00% 15 Missing ⚠️
rust/numaflow-core/src/metrics.rs 56.66% 13 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #3154      +/-   ##
==========================================
- Coverage   80.21%   80.12%   -0.10%     
==========================================
  Files         296      296              
  Lines       67530    67626      +96     
==========================================
+ Hits        54172    54188      +16     
- Misses      12805    12885      +80     
  Partials      553      553              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@yhl25 yhl25 requested a review from syayi January 27, 2026 22:44
@adarsh0728 adarsh0728 requested a review from BulkBeing January 28, 2026 15:05
Signed-off-by: adarsh0728 <[email protected]>
@adarsh0728 adarsh0728 marked this pull request as ready for review January 29, 2026 10:28
Copy link
Member

@vigith vigith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps we should try a macro approach.. let me try to do that in a follow up PR

@vigith vigith merged commit b68393a into main Jan 29, 2026
45 of 46 checks passed
@vigith vigith deleted the critical-metric branch January 29, 2026 18:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants