Skip to content

Conversation

@HannanNaeem
Copy link
Contributor

Our current implementation does not support float32 scalars for add, multiply, and divide. This PR introduces a simple fix in the logic to allow computation with float32 scalars.


Currently Pykokkos always casts scalars as pk.double which is equivalent to float64 in numpy terms. We also then enforce that both operands, for the aforementioned ufuncs, be the same type .e.g. either float32 or float64. This creates a problem when passing a float32 value as scalar with a float32 view. The scalar is casted as float64 and the type assertion fails by our own doing.

To fix this:

  • Scalar takes the type Float32 or float64 based on the view it is passed with (so they remain the same)
  • Tweak the float impls to use modulus indexing to support scalars

Quality tweak:

  • Updated error messages to be verbose about what types mismatched

@IvanGrigorik IvanGrigorik self-requested a review December 19, 2025 15:44
@pk.workunit
def add_impl_1d_float(tid: int, viewA: pk.View1D[pk.float], viewB: pk.View1D[pk.float], out: pk.View1D[pk.float]):
out[tid] = viewA[tid] + viewB[tid]
out[tid] = viewA[tid] + viewB[tid % viewB.extent(0)]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Integer modulus is quite costly performance wise, is there a reason this is necessary here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its been a while, but I am able to recall that this has to do with supporting operations with scalars. I think we were, at the time, using this pattern to support operations with scalars (by putting them in a single-value view) by not needing a whole other workunit for them.

There are obviously questions around what happens if the sizes are not the same AND the viewB is not a singleton... I am assuming at the time we were OK with this behavior.

Comment on lines +1071 to +1074
if not isinstance(viewA, pk.ViewType) and viewA.dtype.__name__ not in [
"float32",
"float64",
]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hate the fact that each time something is related to existing data types, we need to do it this way.
We should make a type mapping (from PyKokkos to Kokkos and vice versa) and use this mapper each time whenever we are dealing with data types.
The string comparison is not good at all, but for this PR, this is alright.

@IvanGrigorik
Copy link
Collaborator

Overall LG!

@IvanGrigorik
Copy link
Collaborator

@kennykos are you ok with modulus, or do you want me to check out other options?

@gliga
Copy link
Contributor

gliga commented Jan 25, 2026

We can always merge as is and then improve later if needed, but I will let @kennykos make a final call on his comment.

@kennykos
Copy link
Collaborator

We can always merge as is and then improve later if needed, but I will let @kennykos make a final call on his comment.

Yes, I am fine merging this now, but avoiding integer % and / on device is something we should keep in mind for the future.

@gliga
Copy link
Contributor

gliga commented Jan 25, 2026

We can always merge as is and then improve later if needed, but I will let @kennykos make a final call on his comment.

Yes, I am fine merging this now, but avoiding integer % and / on device is something we should keep in mind for the future.

You wanted to see more of shifts? Wouldn't we do this in the translation anyway and not in the python code?

@kennykos
Copy link
Collaborator

We can always merge as is and then improve later if needed, but I will let @kennykos make a final call on his comment.

Yes, I am fine merging this now, but avoiding integer % and / on device is something we should keep in mind for the future.

You wanted to see more of shifts? Wouldn't we do this in the translation anyway and not in the python code?

I'm a little confused, are you suggesting that we check if a%b can be replaced with a-b in translation, and is so change the workunit code?

@gliga
Copy link
Contributor

gliga commented Jan 25, 2026

We can always merge as is and then improve later if needed, but I will let @kennykos make a final call on his comment.

Yes, I am fine merging this now, but avoiding integer % and / on device is something we should keep in mind for the future.

You wanted to see more of shifts? Wouldn't we do this in the translation anyway and not in the python code?

I'm a little confused, are you suggesting that we check if a%b can be replaced with a-b in translation, and is so change the workunit code?

Yes, I am suggesting that optimizations should be done in translation and not count on a user to optimize for a specific platform.

@kennykos
Copy link
Collaborator

We can always merge as is and then improve later if needed, but I will let @kennykos make a final call on his comment.

Yes, I am fine merging this now, but avoiding integer % and / on device is something we should keep in mind for the future.

You wanted to see more of shifts? Wouldn't we do this in the translation anyway and not in the python code?

I'm a little confused, are you suggesting that we check if a%b can be replaced with a-b in translation, and is so change the workunit code?

Yes, I am suggesting that optimizations should be done in translation and not count on a user to optimize for a specific platform.

Ah, that makes perfect sense, sounds good.

@IvanGrigorik IvanGrigorik merged commit b4ee95c into kokkos:main Jan 26, 2026
22 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants