-
Notifications
You must be signed in to change notification settings - Fork 326
Open
Labels
cubFor all items related to CUBFor all items related to CUB
Description
Is this a duplicate?
- I confirmed there appear to be no duplicate issues for this request and that I agree to the Code of Conduct
Area
CUB
Is your feature request related to a problem? Please describe.
For vectorized code paths we often use something like InputT input[items_per_thread];
Because this is a plain C-array, the compiler will try to value-initialize the elements. However, this is almost never what we want because we want to fill it with data usually through a vectorized load.
Describe the solution you'd like
We should provide a type that holds the array in a union, so that we can create it without spending time on initializing it.
Describe alternatives you've considered
Alternatively we could use alignas(InputT) cuda::std::byte input[items_per_thread * sizeof(InputT)]; and then cast, but I am not sure what is cleaner
Additional context
No response
Metadata
Metadata
Assignees
Labels
cubFor all items related to CUBFor all items related to CUB
Type
Projects
Status
Todo