Skip to content

Conversation

@galagam
Copy link
Collaborator

@galagam galagam commented Jan 19, 2026

Description

FP8 MoE kernel requires a scalar input scales, but models may have different input scales per expert. Previously, the autodeploy code used the first expert's scale (input_scale[0]), which could cause accuracy issues when scales differ significantly.

Changes:

  • Use max(input_scale) for FC1 and FC2 input quantization, matching TRT-LLM manual backend
  • Precompute max input scales at transform time for both trtllm and triton backends
  • Add config option to FuseFP8MoeConfig:
    • False (default): Assert all experts have identical scales, fail if not
    • True: Allow different scales with a warning, use max() for quantization
  • Update kernel signatures to take precomputed scalar scales instead of tensor scales
  • Add unit test for the new config option

Test Coverage

tests/unittest/_torch/auto_deploy/unit/singlegpu/transformations/library/test_moe_fusion.py::test_fp8_moe_different_input_scales

PR Checklist

Please review the following before submitting your PR:

  • PR description clearly explains what and why. If using CodeRabbit's summary, please make sure it makes sense.

  • PR Follows TRT-LLM CODING GUIDELINES to the best of your knowledge.

  • Test cases are provided for new code paths (see test instructions)

  • Any new dependencies have been scanned for license and vulnerabilities

  • CODEOWNERS updated if ownership changes

  • Documentation updated as needed

  • Update tava architecture diagram if there is a significant design change in PR.

  • The reviewers assigned automatically/manually are appropriate for the PR.

  • Please check this after reviewing the above items as appropriate for this PR.

GitHub Bot Help

/bot [-h] ['run', 'kill', 'skip', 'reuse-pipeline'] ...

Provide a user friendly way for developers to interact with a Jenkins server.

Run /bot [-h|--help] to print this help message.

See details below for each supported subcommand.

Details

run [--reuse-test (optional)pipeline-id --disable-fail-fast --skip-test --stage-list "A10-PyTorch-1, xxx" --gpu-type "A30, H100_PCIe" --test-backend "pytorch, cpp" --add-multi-gpu-test --only-multi-gpu-test --disable-multi-gpu-test --post-merge --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" --detailed-log --debug(experimental)]

Launch build/test pipelines. All previously running jobs will be killed.

--reuse-test (optional)pipeline-id (OPTIONAL) : Allow the new pipeline to reuse build artifacts and skip successful test stages from a specified pipeline or the last pipeline if no pipeline-id is indicated. If the Git commit ID has changed, this option will be always ignored. The DEFAULT behavior of the bot is to reuse build artifacts and successful test results from the last pipeline.

--disable-reuse-test (OPTIONAL) : Explicitly prevent the pipeline from reusing build artifacts and skipping successful test stages from a previous pipeline. Ensure that all builds and tests are run regardless of previous successes.

--disable-fail-fast (OPTIONAL) : Disable fail fast on build/tests/infra failures.

--skip-test (OPTIONAL) : Skip all test stages, but still run build stages, package stages and sanity check stages. Note: Does NOT update GitHub check status.

--stage-list "A10-PyTorch-1, xxx" (OPTIONAL) : Only run the specified test stages. Examples: "A10-PyTorch-1, xxx". Note: Does NOT update GitHub check status.

--gpu-type "A30, H100_PCIe" (OPTIONAL) : Only run the test stages on the specified GPU types. Examples: "A30, H100_PCIe". Note: Does NOT update GitHub check status.

--test-backend "pytorch, cpp" (OPTIONAL) : Skip test stages which don't match the specified backends. Only support [pytorch, cpp, tensorrt, triton]. Examples: "pytorch, cpp" (does not run test stages with tensorrt or triton backend). Note: Does NOT update GitHub pipeline status.

--only-multi-gpu-test (OPTIONAL) : Only run the multi-GPU tests. Note: Does NOT update GitHub check status.

--disable-multi-gpu-test (OPTIONAL) : Disable the multi-GPU tests. Note: Does NOT update GitHub check status.

--add-multi-gpu-test (OPTIONAL) : Force run the multi-GPU tests in addition to running L0 pre-merge pipeline.

--post-merge (OPTIONAL) : Run the L0 post-merge pipeline instead of the ordinary L0 pre-merge pipeline.

--extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx" (OPTIONAL) : Run the ordinary L0 pre-merge pipeline and specified test stages. Examples: --extra-stage "H100_PCIe-TensorRT-Post-Merge-1, xxx".

--detailed-log (OPTIONAL) : Enable flushing out all logs to the Jenkins console. This will significantly increase the log volume and may slow down the job.

--debug (OPTIONAL) : Experimental feature. Enable access to the CI container for debugging purpose. Note: Specify exactly one stage in the stage-list parameter to access the appropriate container environment. Note: Does NOT update GitHub check status.

For guidance on mapping tests to stage names, see docs/source/reference/ci-overview.md
and the scripts/test_to_stage_mapping.py helper.

kill

kill

Kill all running builds associated with pull request.

skip

skip --comment COMMENT

Skip testing for latest commit on pull request. --comment "Reason for skipping build/test" is required. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

reuse-pipeline

reuse-pipeline

Reuse a previous pipeline to validate current commit. This action will also kill all currently running builds associated with the pull request. IMPORTANT NOTE: This is dangerous since lack of user care and validation can cause top of tree to break.

Summary by CodeRabbit

  • New Features

    • Added support for handling different input scales in FP8 Mixture of Experts fusion.
    • New configuration option enables flexible quantization handling when experts have varying input scales.
    • Improved quantization efficiency using precomputed maximum scales.
  • Tests

    • Added comprehensive test coverage for FP8 MoE fusion across multiple backend configurations.

✏️ Tip: You can customize this high-level summary in your review settings.

@galagam galagam requested review from a team as code owners January 19, 2026 20:33
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 19, 2026

📝 Walkthrough

Walkthrough

This PR introduces support for handling different per-expert input scales in FP8 MoE fusion. An allow_different_input_scales configuration flag is added to enable using maximum input scales across experts instead of requiring identical scales. Parameter names are updated from _input_scale to _input_scale_max and fc1_act_scale to fc1_act_scale_max to reflect precomputed maximum scales. The fusion transform computes per-expert maximum scales and passes them to custom operators when the flag is enabled.

Changes

Cohort / File(s) Summary
Configuration Files
examples/auto_deploy/super_v3.yaml, tensorrt_llm/_torch/auto_deploy/config/default.yaml
Added allow_different_input_scales: false flag to MoE fusion configuration in both example and default YAML configs.
Custom Operators - Triton Backend
tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/triton_moe.py
Renamed input scale parameters from w1_input_scale, w2_input_scale, w3_input_scale to w1_input_scale_max, w2_input_scale_max, w3_input_scale_max to reflect precomputed maximum scales. Updated corresponding fake variant function signature.
Custom Operators - TensorRT-LLM Backend
tensorrt_llm/_torch/auto_deploy/custom_ops/fused_moe/trtllm_moe.py
Renamed fc1_act_scale parameter to fc1_act_scale_max in both trtllm_quant_fp8_moe_fused and trtllm_quant_fp8_moe_fused_fake. Updated quantization flow to use scalar maximum activation scale instead of per-element scales.
Fusion Transform Core Logic
tensorrt_llm/_torch/auto_deploy/transform/library/fused_moe.py
Implemented feature: added allow_different_input_scales parameter to _stack_fp8_moe_weights and FuseFP8MoeConfig. When enabled, computes per-expert maximum input scales and logs warnings instead of asserting identical scales. Updated argument construction to pass maximum scales to custom operators. Added ad_logger import.
Tests
tests/unittest/_torch/auto_deploy/unit/singlegpu/custom_ops/test_trtllm_moe.py
Added parametrized test test_fp8_moe_different_input_scales covering multiple backends and scale configurations. Updated existing FP8 MoE test call sites to use new parameter names (fc1_act_scale_max, hidden_states_scale.reshape(1)).

Sequence Diagram(s)

sequenceDiagram
    participant Config as Config (allow_different_input_scales flag)
    participant Transform as FuseFP8Moe._apply
    participant StackWeights as _stack_fp8_moe_weights
    participant ComputeScales as Compute Max Scales
    participant CustomOp as Custom Operator<br/>(trtllm_quant_fp8_moe_fused)
    
    Config->>Transform: Load config with allow_different_input_scales
    Transform->>StackWeights: Call with flag
    StackWeights->>ComputeScales: Check expert input scales
    
    alt allow_different_input_scales enabled
        ComputeScales->>ComputeScales: Find max scale across experts
        ComputeScales->>StackWeights: Return max scales
        StackWeights->>StackWeights: Log warning (scales differ)
    else allow_different_input_scales disabled
        ComputeScales->>ComputeScales: Assert all scales identical
        ComputeScales->>StackWeights: Return scales
    end
    
    StackWeights->>CustomOp: Pass fc1_act_scale_max<br/>(precomputed max)
    CustomOp->>CustomOp: Quantize with max scale
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description check ✅ Passed The PR description clearly explains the issue, solution, test coverage, and includes a completed checklist; all required sections are present and substantively filled.
Title check ✅ Passed The title clearly summarizes the main change: adding support for per-expert scales in FP8 MoE fusion within AutoDeploy, which directly aligns with all the file modifications across the codebase.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In
`@tests/unittest/_torch/auto_deploy/unit/singlegpu/custom_ops/test_trtllm_moe.py`:
- Around line 1143-1144: Fix the assertion message in the torch.allclose check:
update the f-string in the assertion that references torch.allclose(output,
ref_output, rtol=0.05, atol=0.05) so the message includes a space (or newline)
before "Max diff:" and correctly reflects the rtol/atol values used; adjust the
text to something like "rtol=0.05, atol=0.05 Max diff: {(output -
ref_output).abs().max()}" to avoid concatenation and value mismatch.
🧹 Nitpick comments (1)
tests/unittest/_torch/auto_deploy/unit/singlegpu/custom_ops/test_trtllm_moe.py (1)

1127-1134: Consider using attribute access instead of getattr with constant strings.

The static analysis hint is valid but minor. Using direct attribute access is cleaner when the attribute name is known at write-time.

♻️ Suggested improvement
             if backend == "trtllm":
-                actual_w1_input_max = getattr(gm, "quant_moe_fc1_act_scale_max_0")
+                actual_w1_input_max = gm.quant_moe_fc1_act_scale_max_0
             else:
-                actual_w1_input_max = getattr(gm, "quant_moe_w1_input_scale_max_0").squeeze()
+                actual_w1_input_max = gm.quant_moe_w1_input_scale_max_0.squeeze()

@galagam
Copy link
Collaborator Author

galagam commented Jan 20, 2026

/bot run --extra-stage "DGX_B200-4_GPUs-AutoDeploy-1, DGX_H100-4_GPUs-AutoDeploy-1"

@tensorrt-cicd
Copy link
Collaborator

PR_Github #32652 [ run ] triggered by Bot. Commit: 831ecd9

@tensorrt-cicd
Copy link
Collaborator

PR_Github #32652 [ run ] completed with state SUCCESS. Commit: 831ecd9
/LLM/main/L0_MergeRequest_PR pipeline #25276 completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

@galagam
Copy link
Collaborator Author

galagam commented Jan 20, 2026

/bot run --extra-stage "DGX_B200-4_GPUs-AutoDeploy-1, DGX_H100-4_GPUs-AutoDeploy-1"

@tensorrt-cicd
Copy link
Collaborator

PR_Github #32710 [ run ] triggered by Bot. Commit: 5babb91

@galagam galagam marked this pull request as draft January 20, 2026 09:54
@tensorrt-cicd
Copy link
Collaborator

PR_Github #32710 [ run ] completed with state SUCCESS. Commit: 5babb91
/LLM/main/L0_MergeRequest_PR pipeline #25316 completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

@galagam
Copy link
Collaborator Author

galagam commented Jan 20, 2026

/bot run --extra-stage "DGX_B200-4_GPUs-AutoDeploy-1, DGX_H100-4_GPUs-AutoDeploy-1"

@tensorrt-cicd
Copy link
Collaborator

PR_Github #32749 [ run ] triggered by Bot. Commit: 1c7e940

@galagam galagam self-assigned this Jan 20, 2026
@tensorrt-cicd
Copy link
Collaborator

PR_Github #32749 [ run ] completed with state SUCCESS. Commit: 1c7e940
/LLM/main/L0_MergeRequest_PR pipeline #25348 completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

@galagam galagam force-pushed the gagam/handle-non-identical-moe-scales branch from 1c7e940 to 86396a1 Compare January 20, 2026 16:27
@galagam galagam marked this pull request as ready for review January 20, 2026 16:27
@galagam
Copy link
Collaborator Author

galagam commented Jan 20, 2026

/bot run --extra-stage "DGX_B200-4_GPUs-AutoDeploy-1, DGX_H100-4_GPUs-AutoDeploy-1"

@tensorrt-cicd
Copy link
Collaborator

PR_Github #32777 [ run ] triggered by Bot. Commit: 86396a1

@galagam galagam force-pushed the gagam/handle-non-identical-moe-scales branch from 86396a1 to c2f5e37 Compare January 20, 2026 17:49
@galagam
Copy link
Collaborator Author

galagam commented Jan 20, 2026

/bot kill

@tensorrt-cicd
Copy link
Collaborator

PR_Github #32791 [ kill ] triggered by Bot. Commit: c2f5e37

@tensorrt-cicd
Copy link
Collaborator

PR_Github #32791 [ kill ] completed with state SUCCESS. Commit: c2f5e37
Successfully killed previous jobs for commit c2f5e37

@galagam galagam force-pushed the gagam/handle-non-identical-moe-scales branch from c2f5e37 to caa90fa Compare January 20, 2026 18:14
@galagam
Copy link
Collaborator Author

galagam commented Jan 20, 2026

/bot run --extra-stage "DGX_B200-4_GPUs-AutoDeploy-1, DGX_H100-4_GPUs-AutoDeploy-1"

@tensorrt-cicd
Copy link
Collaborator

PR_Github #32793 [ run ] triggered by Bot. Commit: caa90fa

gpu:
- '*b200*'
linux_distribution_name: ubuntu*
cpu: x86_64
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lucaslie please take a look

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we absolutely need 8x B200?

  • If not, please add the test to the 4x B200 stage.
  • If yes please combine all B200 multi-gpu tests into a single 8x B200 stage

This is because stages have overhead and I was asked by the CI team to ensure we don't unnecessarily add too many stages, see here

Also, if you do add a new stage:

  1. please follow the steps in my PR: [None][infra] separate AutoDeploy tests into own stages #10634
  2. update our dev guide and my pinned slack message to correctly reference the new AutoDeploy test stages and their names

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't know it's such a big overhead. I'll stick with the world_size=4 tests and remove this stage. Same for dgx_h100.

@tensorrt-cicd
Copy link
Collaborator

PR_Github #33556 [ run ] triggered by Bot. Commit: 96b3ee5

@tensorrt-cicd
Copy link
Collaborator

PR_Github #33556 [ run ] completed with state SUCCESS. Commit: 96b3ee5
/LLM/main/L0_MergeRequest_PR pipeline #25883 completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

@galagam galagam force-pushed the gagam/handle-non-identical-moe-scales branch from 96b3ee5 to 009e418 Compare January 26, 2026 12:00
@galagam
Copy link
Collaborator Author

galagam commented Jan 26, 2026

/bot run --extra-stage "DGX_B200-4_GPUs-AutoDeploy-1, DGX_B200-8_GPUs-AutoDeploy-1, DGX_H100-4_GPUs-AutoDeploy-1, DGX_H100-8_GPUs-AutoDeploy-1"

@tensorrt-cicd
Copy link
Collaborator

PR_Github #33588 [ run ] triggered by Bot. Commit: 009e418

@tensorrt-cicd
Copy link
Collaborator

PR_Github #33588 [ run ] completed with state FAILURE. Commit: 009e418
/LLM/main/L0_MergeRequest_PR pipeline #25909 completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

@galagam
Copy link
Collaborator Author

galagam commented Jan 26, 2026

/bot run --extra-stage "DGX_B200-4_GPUs-AutoDeploy-1, DGX_B200-8_GPUs-AutoDeploy-1, DGX_H100-4_GPUs-AutoDeploy-1, DGX_H100-8_GPUs-AutoDeploy-1"

@tensorrt-cicd
Copy link
Collaborator

PR_Github #33599 [ run ] triggered by Bot. Commit: 009e418

@tensorrt-cicd
Copy link
Collaborator

PR_Github #33599 [ run ] completed with state FAILURE. Commit: 009e418
/LLM/main/L0_MergeRequest_PR pipeline #25919 completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

@galagam
Copy link
Collaborator Author

galagam commented Jan 26, 2026

/bot run

@tensorrt-cicd
Copy link
Collaborator

PR_Github #33604 [ run ] triggered by Bot. Commit: 009e418

stage: post_load_fusion
enabled: true
backend: trtllm
allow_different_input_scales: false
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why wouldn't we just allow this by default if this is the default behavior?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lucaslie Just being cautious. We can enable anytime.
We should be getting a checkpoint with all identical scales at some point, I don't know how common it is to have different scales (allegedly not at all).

https://nvidia.slack.com/archives/C09D47NAYNR/p1767902427747889

enabled: true
backend: trtllm
allow_different_input_scales: false
fuse_nvfp4_moe:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you also implement this for nvfp4? If not, can you either add it here or add a ticket to our backlog?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lucaslie nvfp4 is using per-block dynamic quantization on the inputs, so scale computation is part of the kernel. This is only relevant for fp8.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correction. This seems to be an issue for nvfp4 too.
@tcherckez-nvidia to handle as part of nvfp4 super enablement.

gpu:
- '*b200*'
linux_distribution_name: ubuntu*
cpu: x86_64
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we absolutely need 8x B200?

  • If not, please add the test to the 4x B200 stage.
  • If yes please combine all B200 multi-gpu tests into a single 8x B200 stage

This is because stages have overhead and I was asked by the CI team to ensure we don't unnecessarily add too many stages, see here

Also, if you do add a new stage:

  1. please follow the steps in my PR: [None][infra] separate AutoDeploy tests into own stages #10634
  2. update our dev guide and my pinned slack message to correctly reference the new AutoDeploy test stages and their names

@tensorrt-cicd
Copy link
Collaborator

PR_Github #33604 [ run ] completed with state SUCCESS. Commit: 009e418
/LLM/main/L0_MergeRequest_PR pipeline #25923 completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

…n FP8 MoE

FP8 MoE kernel requires a scalar input scales, but models may have
different input scales per expert. Previously, the autodeploy code
used the first expert's scale (input_scale[0]), which could cause accuracy
issues when scales differ significantly.

Changes:
- Use max(input_scale) for FC1 and FC2 input quantization, matching TRT-LLM manual backend
- Precompute max input scales at transform time for both trtllm and triton backends
- Add config option to FuseFP8MoeConfig:
  - False (default): Assert all experts have identical scales, fail if not
  - True: Allow different scales with a warning, use max() for quantization
- Update kernel signatures to take precomputed scalar scales instead of tensor scales
- Add unit test for the new config option

Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
Signed-off-by: Gal Hubara Agam <96368689+galagam@users.noreply.github.com>
@galagam galagam force-pushed the gagam/handle-non-identical-moe-scales branch from 2d40ccb to c566908 Compare January 26, 2026 17:43
@galagam
Copy link
Collaborator Author

galagam commented Jan 26, 2026

/bot run --extra-stage "DGX_B200-4_GPUs-AutoDeploy-1, DGX_H100-4_GPUs-AutoDeploy-1"

@tensorrt-cicd
Copy link
Collaborator

PR_Github #33619 [ run ] triggered by Bot. Commit: c566908

@tensorrt-cicd
Copy link
Collaborator

PR_Github #33619 [ run ] completed with state SUCCESS. Commit: c566908
/LLM/main/L0_MergeRequest_PR pipeline #25934 completed with status: 'FAILURE'

⚠️ Action Required:

  • Please check the failed tests and fix your PR
  • If you cannot view the failures, ask the CI triggerer to share details
  • Once fixed, request an NVIDIA team member to trigger CI again

@galagam
Copy link
Collaborator Author

galagam commented Jan 26, 2026

/bot run --extra-stage "DGX_B200-4_GPUs-AutoDeploy-1, DGX_H100-4_GPUs-AutoDeploy-1" --disable-fail-fast

@tensorrt-cicd
Copy link
Collaborator

PR_Github #33628 [ run ] triggered by Bot. Commit: c566908

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants