Skip to content

Conversation

@zasdfgbnm
Copy link
Collaborator

No description provided.

@zasdfgbnm
Copy link
Collaborator Author

!test

@github-actions
Copy link

github-actions bot commented Jan 20, 2026

Description

  • Remove redundant device index setting in fusion kernel runtime

  • Eliminates unnecessary setDeviceIndex call in prepareInputs function

  • Streamlines input preparation logic for segmented kernels

Changes walkthrough

Relevant files
Bug fix
fusion_kernel_runtime.cpp
Remove redundant device index setting                                       

csrc/runtime/fusion_kernel_runtime.cpp

  • Removed line setting device index on group runtime inputs
  • Simplified prepareInputs function by eliminating redundant call
  • Maintains existing cache ID functionality
  • +0/-1     

    PR Reviewer Guide

    Here are some key observations to aid the review process:

    🧪 No relevant tests
    🔒 No security concerns identified
    ⚡ Recommended focus areas for review
    Missing Context

    The PR removes a call to setDeviceIndex() without providing explanation of why this change is needed, what problem it solves, or what the expected impact is. This lack of context makes it difficult to assess the correctness and potential side effects of the change.

    group_runtime_inputs.setCacheId(group_cache_id.value());
    Potential Functionality Impact

    Removing the device index setting could potentially affect device selection and memory allocation behavior. The change should be validated to ensure it doesn't break device-specific operations or cause runtime errors.

    group_runtime_inputs.setCacheId(group_cache_id.value());

    Test failures

    • (High, 25) CUDA illegal memory access in TestNvFuserFrontend on A100

      Test Name A100 Source
      tests.python.test_python_frontend.TestNvFuserFrontend.test_signbit
      tests.python.test_python_frontend.TestNvFuserFrontend.test_slice_api
      tests.python.test_python_frontend.TestNvFuserFrontend.test_slice_error_checks
      tests.python.test_python_frontend.TestNvFuserFrontend.test_squeeze
      tests.python.test_python_frontend.TestNvFuserFrontend.test_static_tensor_sizes
      tests.python.test_python_frontend.TestNvFuserFrontend.test_stride_order_with_explicit_broadcast
      tests.python.test_python_frontend.TestNvFuserFrontend.test_sum_sliced_reshape_to_broadcast
      tests.python.test_python_frontend.TestNvFuserFrontend.test_take_along_axis
      tests.python.test_python_frontend.TestNvFuserFrontend.test_tensor_ndim
      tests.python.test_python_frontend.TestNvFuserFrontend.test_tensor_shape
      ... with 15 more test failures omitted. Check internal logs.
    • (High, 1) CUDA illegal memory access in nvFuser test_selected_device on A100

      Test Name A100 Source
      tests.python.test_python_frontend.TestNvFuserFrontend.test_selected_device
    • (Medium, 1) Scalar numerical mismatch in thunder.tests.test_networks (nanoGPT autograd, CUDA, A100)

      Test Name A100 Source
      thunder.tests.test_networks.test_nanogpt_complete_autograd_nvfuser_cuda_thunder.dtypes.float32

    @zasdfgbnm zasdfgbnm changed the title Update fusion_kernel_runtime.cpp Remove unused setDeviceIndex in FusionKernelRuntime::prepareInputs Jan 21, 2026
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

    Labels

    None yet

    Projects

    None yet

    Development

    Successfully merging this pull request may close these issues.

    2 participants