-
Notifications
You must be signed in to change notification settings - Fork 550
cellsToMultiPolygon core algorithm #1113
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
| H3Index *cellsCopy = H3_MEMORY(malloc)(numCells * sizeof(H3Index)); | ||
| memcpy(cellsCopy, cells, numCells * sizeof(H3Index)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
check for malloc failing?
it may be helpful to add a comment that numCells * sizeof(H3Index) cannot overflow because cells is already of that size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a check with checkCellsToMultiPolyOverflow(). Although, I don't think this will happen with any normal inputs, since based on the current size of the Arc struct and HASH_TABLE_MULTIPLIER, this doesn't happen until numCells is 33x the number of res 15 cells.
src/h3lib/lib/cellsToMultiPoly.c
Outdated
| static const uint8_t idxp[5] = {0, 1, 3, 2, 4}; | ||
| const uint8_t *idx; | ||
|
|
||
| H3_EXPORT(originToDirectedEdges)(h, _edges); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this could return an error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So that should never return an error because we validate all the input cells up front with validateCellSet.
I tried using the "never" pattern with
H3Error err = H3_EXPORT(originToDirectedEdges)(h, _edges);
NEVER(err);but I get an error like
/Users/runner/work/h3/h3/src/h3lib/lib/cellsToMultiPoly.c:185:11: error: expression result unused [-Werror,-Wunused-value]
185 | NEVER(err);
| ^~~
/Users/runner/work/h3/h3/src/h3lib/include/h3Assert.h:122:19: note: expanded from macro 'NEVER'
122 | #define NEVER(X) (X)
| ^
1 error generated.
Any ideas on how better to handle that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like the "right" way to do this is to have these helper functions return H3Error and propagate those up. I'll make those changes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had Claude Code take a pass. It checks all the memory allocations, and does a better job of cleaning up memory in case of an error. The code is more verbose, and this seems to be much more thorough than what we typically do in the library.
WDYT @isaacbrodsky @nrabinowitz @dfellis ?
Also, coverage drops with all this defensive code, but I think that can be fixed with a few NEVERs. But I wanted to sanity check with everyone on if this is the right direction to be going before continuing.
If we like this level of rigor, we could probably do the same elsewhere in the library. This is an eval from Claude:
Based on my analysis of the codebase, cellsToMultiPoly.c is now MORE thorough than typical H3 library code.
Comparison with Other Files:
Functions with SIMILAR thoroughness:
- polygonToCells() in algos.c - checks most allocations, has cleanup on error, but still has some gaps
Functions with LESS thoroughness:
- vertexGraph.c - uses assert(ptr != NULL) instead of returning errors (disappears in production!)
- linkedGeo.c - mix of assert() and proper checks
- compactCells() in h3Index.c - checks allocations but cleanup paths are less systematic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: One option to make the code less verbose would be to use goto: https://www.kernel.org/doc/html/latest/process/coding-style.html#centralized-exiting-of-functions
Since we don't do that anywhere else in the library, I'd definitely want to sanity check with folks before going down that path :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've reworked it so that we always check for allocation errors and pass them up. We should also be handling memory clean up in all error cases. And the changes are at 100% line and branch coverage.
0673158 to
897f5b4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved to benchmarkCellsToPolyAlgos.c and added some new tests. Renamed because it compares both cellsToLinkedMultiPolygon and cellsToMultiPolygon
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These tests are no longer needed. These were handmade polygons that are more easily reproduced with sets of cells and applying cellsToMultiPolygon.
| * | ||
| * @return GeoMultiPolygon covering entire globe | ||
| */ | ||
| GeoMultiPolygon createGlobeMultiPolygon() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move this to cellsToMultiPoly.c
|
Here's an updated benchmark after the changes above, including the additional validation now done in |
|
This change is now ready for review. I've added I see two obvious next steps, and wanted to get feedback on ordering:
|
Following #1103
This is the core algorithm for
cellsToMultiPolygon, which:cellsToMultiPolygonfunction #1103Example "global polygon":
Benchmarks
In the last two benchmarks
cellsToMultiPolygonis about 4x faster thancellsToLinkedMultiPolygon.Additional speed improvements
After profiling on my Mac with Instruments (see the
justfilerecipe), these are my thoughts on what we can do to speed this up even more:reverseDirectedEdgeis faster thandirectedEdgeToBoundary, it is still a bottleneck, so any improvements would help significantly.ngeometric vertices fromdirectedEdgeToBoundary, but we actually only need the firstn-1vertices (since the last vertex of one edge is the same as the first of the following edge). I'm guessing we can get a savings from only computing the vertices we need.Next steps
GeoMultiPolygonandLinkedMultiPolygoncellsToLinkedMultiPolygona light wrapper aroundcellsToMultiPolygon