-
Notifications
You must be signed in to change notification settings - Fork 267
Implement batched gemm bias permute for RDNA4 #3534
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement batched gemm bias permute for RDNA4 #3534
Conversation
…rs for gridwise_gemm_wmma_cshuffle_v3, test setup for odd cases
…_bias_permute-for-rdna4
|
Can you also add an example for wmma? |
EnricoDeg
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work !
.../tensor_operation/gpu/device/impl/device_batched_contraction_multiple_d_wmma_cshuffle_v3.hpp
Show resolved
Hide resolved
.../tensor_operation/gpu/device/impl/device_batched_contraction_multiple_d_wmma_cshuffle_v3.hpp
Outdated
Show resolved
Hide resolved
.../tensor_operation/gpu/device/impl/device_batched_contraction_multiple_d_wmma_cshuffle_v3.hpp
Outdated
Show resolved
Hide resolved
.../tensor_operation/gpu/device/impl/device_batched_contraction_multiple_d_wmma_cshuffle_v3.hpp
Outdated
Show resolved
Hide resolved
.../tensor_operation/gpu/device/impl/device_batched_contraction_multiple_d_wmma_cshuffle_v3.hpp
Show resolved
Hide resolved
...ermute/device_batched_gemm_bias_permute_m2_n3_k1_wmma_c_shuffle_f16_f16_f16_f16_instance.cpp
Outdated
Show resolved
Hide resolved
ApoorvaKalyani
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work!
I also think we need more instances and we need to reverify the tests for those.
…e code between platforms
…tances to the test
…_bias_permute-for-rdna4
|
@EnricoDeg @ApoorvaKalyani Thank you for the reviews. I processed the comments, added an example and added a couple of instances for both v1 and v3 pipelines. Let me know if there's still something you'd like to see changed. |
LGTM |
…ptors dependent on the transfer method
Great! |
Proposed changes
This MR implements batched gemm bias permute for RDNA3/4. In practice, this is a multidimensional contraction operation. The MR contains the following:
device_batched_contraction_multiple_d_wmma_cshuffle_v3)GridwiseGemmWmmaCShuffleV3to allow passing in non-naive grid descriptorsNote that support for different dimensions and D tensor configurations is very limited at the moment. More scaffolding would be needed to add generic support for variable number of dimensions, but with this limited implementation there is at least parity with the XDL versions.
Checklist
Please put an
xinto the boxes that apply. You can also fill these out after creating the PR. If you're not sure, please don't hesitate to ask.clang-formaton all changed filesDiscussion
If this is a relatively large or complex change, feel free to start a discussion by explaining why you chose the solution you did and what alternatives you considered