Skip to content

Conversation

@ajfriend
Copy link
Collaborator

@ajfriend ajfriend commented Dec 28, 2025

Following #1103

This is the core algorithm for cellsToMultiPolygon, which:

  • fixes bugs mentioned in Add cellsToMultiPolygon function #1103
  • allows for cell sets corresponding to "global polygons", which can be (much) larger than a hemisphere, or cross the poles/antimeridian
  • is faster for large cell sets (i.e., many interior cell edges which are not part of the final boundary), but a bit slower for cell sets where most cell edges are part of the final boundary

Example "global polygon":

image

Benchmarks

./build/bin/benchmarkCellsToPolyAlgos
	-- linked_disk2: 24.076400 microseconds per iteration (10000 iterations)
	-- direct_disk2: 15.497900 microseconds per iteration (10000 iterations)
	-- linked_donut: 7.258700 microseconds per iteration (10000 iterations)
	-- direct_donut: 8.847100 microseconds per iteration (10000 iterations)
	-- linked_nestedDonuts: 28.251800 microseconds per iteration (10000 iterations)
	-- direct_nestedDonuts: 34.971400 microseconds per iteration (10000 iterations)
	-- linked_manyChildren: 13878.200000 microseconds per iteration (10 iterations)
	-- direct_manyChildren: 3757.900000 microseconds per iteration (10 iterations)
	-- linked_colorado: 7660.580000 microseconds per iteration (100 iterations)
	-- direct_colorado: 1856.580000 microseconds per iteration (100 iterations)

In the last two benchmarks cellsToMultiPolygon is about 4x faster than cellsToLinkedMultiPolygon.

Additional speed improvements

After profiling on my Mac with Instruments (see the justfile recipe), these are my thoughts on what we can do to speed this up even more:

  1. I expect the biggest impact to come from extending this algorithm to natively handle compacted sets of cells, since we can start with the boundary of a parent cell's ancestors, meaning we can skip the processing of all the internal edges. I expect this will provide a speed up even if the user provides a flat (uncompacted) set and we do the compaction first internally.
  2. My hash table implementation is fairly naive and uses 10x the memory of the number of edges, meaning about 60x as much memory as the input cells. I can reduce this multiplier at some speed cost. A smarter hash implementation might give us a pareto improvement. Note, however, that the number of edges we need to hash will go down significantly for compactable sets when we add the childEdgeIterator optimization.
  3. Even though reverseDirectedEdge is faster than directedEdgeToBoundary, it is still a bottleneck, so any improvements would help significantly.
  4. Currently, we get all n geometric vertices from directedEdgeToBoundary, but we actually only need the first n-1 vertices (since the last vertex of one edge is the same as the first of the following edge). I'm guessing we can get a savings from only computing the vertices we need.
  5. We might consider adding a flag that lets the algorithm skip input validation (e.g., all valid cells and no duplicates)

Next steps

  • functions to translate between GeoMultiPolygon and LinkedMultiPolygon
  • Make cellsToLinkedMultiPolygon a light wrapper around cellsToMultiPolygon

@coveralls
Copy link

coveralls commented Dec 28, 2025

Coverage Status

coverage: 99.012% (+0.1%) from 98.905%
when pulling 2a7a9ff on ajfriend:aj/cells_to_poly
into 66f30ba on uber:master.

@ajfriend ajfriend changed the title cellsToMultiPolygon cellsToMultiPolygon core algorithm Dec 28, 2025
Comment on lines 113 to 114
H3Index *cellsCopy = H3_MEMORY(malloc)(numCells * sizeof(H3Index));
memcpy(cellsCopy, cells, numCells * sizeof(H3Index));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

check for malloc failing?
it may be helpful to add a comment that numCells * sizeof(H3Index) cannot overflow because cells is already of that size.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a check with checkCellsToMultiPolyOverflow(). Although, I don't think this will happen with any normal inputs, since based on the current size of the Arc struct and HASH_TABLE_MULTIPLIER, this doesn't happen until numCells is 33x the number of res 15 cells.

static const uint8_t idxp[5] = {0, 1, 3, 2, 4};
const uint8_t *idx;

H3_EXPORT(originToDirectedEdges)(h, _edges);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this could return an error?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So that should never return an error because we validate all the input cells up front with validateCellSet.

I tried using the "never" pattern with

H3Error err = H3_EXPORT(originToDirectedEdges)(h, _edges);
NEVER(err);

but I get an error like

/Users/runner/work/h3/h3/src/h3lib/lib/cellsToMultiPoly.c:185:11: error: expression result unused [-Werror,-Wunused-value]
  185 |     NEVER(err);
      |           ^~~
/Users/runner/work/h3/h3/src/h3lib/include/h3Assert.h:122:19: note: expanded from macro 'NEVER'
  122 | #define NEVER(X) (X)
      |                   ^
1 error generated.

Any ideas on how better to handle that?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like the "right" way to do this is to have these helper functions return H3Error and propagate those up. I'll make those changes.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had Claude Code take a pass. It checks all the memory allocations, and does a better job of cleaning up memory in case of an error. The code is more verbose, and this seems to be much more thorough than what we typically do in the library.

WDYT @isaacbrodsky @nrabinowitz @dfellis ?

Also, coverage drops with all this defensive code, but I think that can be fixed with a few NEVERs. But I wanted to sanity check with everyone on if this is the right direction to be going before continuing.

If we like this level of rigor, we could probably do the same elsewhere in the library. This is an eval from Claude:

Based on my analysis of the codebase, cellsToMultiPoly.c is now MORE thorough than typical H3 library code.

  Comparison with Other Files:

  Functions with SIMILAR thoroughness:

  - polygonToCells() in algos.c - checks most allocations, has cleanup on error, but still has some gaps

  Functions with LESS thoroughness:

  - vertexGraph.c - uses assert(ptr != NULL) instead of returning errors (disappears in production!)
  - linkedGeo.c - mix of assert() and proper checks
  - compactCells() in h3Index.c - checks allocations but cleanup paths are less systematic

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: One option to make the code less verbose would be to use goto: https://www.kernel.org/doc/html/latest/process/coding-style.html#centralized-exiting-of-functions

Since we don't do that anywhere else in the library, I'd definitely want to sanity check with folks before going down that path :)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've reworked it so that we always check for allocation errors and pass them up. We should also be handling memory clean up in all error cases. And the changes are at 100% line and branch coverage.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to benchmarkCellsToPolyAlgos.c and added some new tests. Renamed because it compares both cellsToLinkedMultiPolygon and cellsToMultiPolygon

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests are no longer needed. These were handmade polygons that are more easily reproduced with sets of cells and applying cellsToMultiPolygon.

*
* @return GeoMultiPolygon covering entire globe
*/
GeoMultiPolygon createGlobeMultiPolygon() {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Move this to cellsToMultiPoly.c

@ajfriend
Copy link
Collaborator Author

ajfriend commented Jan 5, 2026

Here's an updated benchmark after the changes above, including the additional validation now done in cellsToMultiPoly():

./build/bin/benchmarkCellsToPolyAlgos
	-- linked_disk2: 20.551300 microseconds per iteration (10000 iterations)
	-- direct_disk2: 15.538100 microseconds per iteration (10000 iterations)
	-- linked_donut: 7.273900 microseconds per iteration (10000 iterations)
	-- direct_donut: 8.815600 microseconds per iteration (10000 iterations)
	-- linked_nestedDonuts: 28.221600 microseconds per iteration (10000 iterations)
	-- direct_nestedDonuts: 34.816200 microseconds per iteration (10000 iterations)
	-- linked_manyChildren: 13866.900000 microseconds per iteration (10 iterations)
	-- direct_manyChildren: 3822.200000 microseconds per iteration (10 iterations)
	-- linked_colorado: 7663.750000 microseconds per iteration (100 iterations)
	-- direct_colorado: 1960.650000 microseconds per iteration (100 iterations)

@ajfriend
Copy link
Collaborator Author

ajfriend commented Jan 5, 2026

This change is now ready for review. I've added cellsToMultiPolygon() to h3api.h.in to make testing and benchmarking a bit easier, but I'm considering that to be tentative, because I think there are still some API questions to discuss. To focus this PR on the algorithm, I'll defer the docs, fuzzing, CLI, etc to a future PR that we would need before making a release.

I see two obvious next steps, and wanted to get feedback on ordering:

  1. Replace the implementation of cellsToLinkedMultiPolygon with cellsToMultiPolygon
  2. Implement a more general uncompactCellsToMultiPolygon(), which could take in compacted sets and a target resolution. This algorithm would be faster than uncompacting the cell set because there is a way to iterate just the boundary edges of the outline of child cells. This would plug in pretty trivially with the current algorithm. This is where I think we could discuss API options: Do we want a separate uncompactCellsToMultiPolygon()? Should cellsToMultiPolygon always run a compaction first, before calling uncompactCellsToMultiPolygon()? Do these functions always run validation code on the input cell sets, or are we expecting the user to do that? How do we handle errors if the user passes an unvalidated set?

@ajfriend ajfriend mentioned this pull request Jan 8, 2026
4 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants