cellsToMultiPolygon core algorithm #1113

ajfriend · 2025-12-28T02:50:47Z

Following #1103

This is the core algorithm for cellsToMultiPolygon, which:

fixes bugs mentioned in Add cellsToMultiPolygon function #1103
allows for cell sets corresponding to "global polygons", which can be (much) larger than a hemisphere, or cross the poles/antimeridian
is faster for large cell sets (i.e., many interior cell edges which are not part of the final boundary), but a bit slower for cell sets where most cell edges are part of the final boundary

Example "global polygon":

Benchmarks

./build/bin/benchmarkCellsToPolyAlgos
	-- linked_disk2: 24.076400 microseconds per iteration (10000 iterations)
	-- direct_disk2: 15.497900 microseconds per iteration (10000 iterations)
	-- linked_donut: 7.258700 microseconds per iteration (10000 iterations)
	-- direct_donut: 8.847100 microseconds per iteration (10000 iterations)
	-- linked_nestedDonuts: 28.251800 microseconds per iteration (10000 iterations)
	-- direct_nestedDonuts: 34.971400 microseconds per iteration (10000 iterations)
	-- linked_manyChildren: 13878.200000 microseconds per iteration (10 iterations)
	-- direct_manyChildren: 3757.900000 microseconds per iteration (10 iterations)
	-- linked_colorado: 7660.580000 microseconds per iteration (100 iterations)
	-- direct_colorado: 1856.580000 microseconds per iteration (100 iterations)

In the last two benchmarks cellsToMultiPolygon is about 4x faster than cellsToLinkedMultiPolygon.

Additional speed improvements

After profiling on my Mac with Instruments (see the justfile recipe), these are my thoughts on what we can do to speed this up even more:

I expect the biggest impact to come from extending this algorithm to natively handle compacted sets of cells, since we can start with the boundary of a parent cell's ancestors, meaning we can skip the processing of all the internal edges. I expect this will provide a speed up even if the user provides a flat (uncompacted) set and we do the compaction first internally.
My hash table implementation is fairly naive and uses 10x the memory of the number of edges, meaning about 60x as much memory as the input cells. I can reduce this multiplier at some speed cost. A smarter hash implementation might give us a pareto improvement. Note, however, that the number of edges we need to hash will go down significantly for compactable sets when we add the childEdgeIterator optimization.
Even though reverseDirectedEdge is faster than directedEdgeToBoundary, it is still a bottleneck, so any improvements would help significantly.
Currently, we get all n geometric vertices from directedEdgeToBoundary, but we actually only need the first n-1 vertices (since the last vertex of one edge is the same as the first of the following edge). I'm guessing we can get a savings from only computing the vertices we need.
We might consider adding a flag that lets the algorithm skip input validation (e.g., all valid cells and no duplicates)

Next steps

functions to translate between GeoMultiPolygon and LinkedMultiPolygon
Make cellsToLinkedMultiPolygon a light wrapper around cellsToMultiPolygon

coveralls · 2025-12-28T03:04:59Z

coverage: 99.012% (+0.1%) from 98.905%
when pulling 2a7a9ff on ajfriend:aj/cells_to_poly
into 66f30ba on uber:master.

src/h3lib/lib/algos.c

src/h3lib/lib/cellsToMultiPoly.c

isaacbrodsky · 2025-12-29T01:08:40Z

src/h3lib/lib/cellsToMultiPoly.c

+        H3Index *cellsCopy = H3_MEMORY(malloc)(numCells * sizeof(H3Index));
+        memcpy(cellsCopy, cells, numCells * sizeof(H3Index));


check for malloc failing?
it may be helpful to add a comment that numCells * sizeof(H3Index) cannot overflow because cells is already of that size.

I've added a check with checkCellsToMultiPolyOverflow(). Although, I don't think this will happen with any normal inputs, since based on the current size of the Arc struct and HASH_TABLE_MULTIPLIER, this doesn't happen until numCells is 33x the number of res 15 cells.

isaacbrodsky · 2025-12-29T01:12:48Z

src/h3lib/lib/cellsToMultiPoly.c

+    static const uint8_t idxp[5] = {0, 1, 3, 2, 4};
+    const uint8_t *idx;
+
+    H3_EXPORT(originToDirectedEdges)(h, _edges);


this could return an error?

So that should never return an error because we validate all the input cells up front with validateCellSet.

I tried using the "never" pattern with

H3Error err = H3_EXPORT(originToDirectedEdges)(h, _edges); NEVER(err);

but I get an error like

/Users/runner/work/h3/h3/src/h3lib/lib/cellsToMultiPoly.c:185:11: error: expression result unused [-Werror,-Wunused-value] 185 | NEVER(err); | ^~~ /Users/runner/work/h3/h3/src/h3lib/include/h3Assert.h:122:19: note: expanded from macro 'NEVER' 122 | #define NEVER(X) (X) | ^ 1 error generated.

Any ideas on how better to handle that?

Seems like the "right" way to do this is to have these helper functions return H3Error and propagate those up. I'll make those changes.

I had Claude Code take a pass. It checks all the memory allocations, and does a better job of cleaning up memory in case of an error. The code is more verbose, and this seems to be much more thorough than what we typically do in the library.

WDYT @isaacbrodsky @nrabinowitz @dfellis ?

Also, coverage drops with all this defensive code, but I think that can be fixed with a few NEVERs. But I wanted to sanity check with everyone on if this is the right direction to be going before continuing.

If we like this level of rigor, we could probably do the same elsewhere in the library. This is an eval from Claude:

Based on my analysis of the codebase, cellsToMultiPoly.c is now MORE thorough than typical H3 library code. Comparison with Other Files: Functions with SIMILAR thoroughness: - polygonToCells() in algos.c - checks most allocations, has cleanup on error, but still has some gaps Functions with LESS thoroughness: - vertexGraph.c - uses assert(ptr != NULL) instead of returning errors (disappears in production!) - linkedGeo.c - mix of assert() and proper checks - compactCells() in h3Index.c - checks allocations but cleanup paths are less systematic

Note: One option to make the code less verbose would be to use goto: https://www.kernel.org/doc/html/latest/process/coding-style.html#centralized-exiting-of-functions

Since we don't do that anywhere else in the library, I'd definitely want to sanity check with folks before going down that path :)

I've reworked it so that we always check for allocation errors and pass them up. We should also be handling memory clean up in all error cases. And the changes are at 100% line and branch coverage.

ajfriend · 2026-01-05T21:50:52Z

src/apps/benchmarks/benchmarkCellsToLinkedMultiPolygon.c

Moved to benchmarkCellsToPolyAlgos.c and added some new tests. Renamed because it compares both cellsToLinkedMultiPolygon and cellsToMultiPolygon

ajfriend · 2026-01-05T22:00:42Z

src/apps/testapps/testGeoMultiPolygon.c

These tests are no longer needed. These were handmade polygons that are more easily reproduced with sets of cells and applying cellsToMultiPolygon.

ajfriend · 2026-01-05T22:01:41Z

src/h3lib/lib/algos.c

- *
- * @return GeoMultiPolygon covering entire globe
- */
-GeoMultiPolygon createGlobeMultiPolygon() {


Move this to cellsToMultiPoly.c

ajfriend · 2026-01-05T23:21:26Z

Here's an updated benchmark after the changes above, including the additional validation now done in cellsToMultiPoly():

./build/bin/benchmarkCellsToPolyAlgos
	-- linked_disk2: 20.551300 microseconds per iteration (10000 iterations)
	-- direct_disk2: 15.538100 microseconds per iteration (10000 iterations)
	-- linked_donut: 7.273900 microseconds per iteration (10000 iterations)
	-- direct_donut: 8.815600 microseconds per iteration (10000 iterations)
	-- linked_nestedDonuts: 28.221600 microseconds per iteration (10000 iterations)
	-- direct_nestedDonuts: 34.816200 microseconds per iteration (10000 iterations)
	-- linked_manyChildren: 13866.900000 microseconds per iteration (10 iterations)
	-- direct_manyChildren: 3822.200000 microseconds per iteration (10 iterations)
	-- linked_colorado: 7663.750000 microseconds per iteration (100 iterations)
	-- direct_colorado: 1960.650000 microseconds per iteration (100 iterations)

ajfriend · 2026-01-05T23:49:51Z

This change is now ready for review. I've added cellsToMultiPolygon() to h3api.h.in to make testing and benchmarking a bit easier, but I'm considering that to be tentative, because I think there are still some API questions to discuss. To focus this PR on the algorithm, I'll defer the docs, fuzzing, CLI, etc to a future PR that we would need before making a release.

I see two obvious next steps, and wanted to get feedback on ordering:

Replace the implementation of cellsToLinkedMultiPolygon with cellsToMultiPolygon
Implement a more general uncompactCellsToMultiPolygon(), which could take in compacted sets and a target resolution. This algorithm would be faster than uncompacting the cell set because there is a way to iterate just the boundary edges of the outline of child cells. This would plug in pretty trivially with the current algorithm. This is where I think we could discuss API options: Do we want a separate uncompactCellsToMultiPolygon()? Should cellsToMultiPolygon always run a compaction first, before calling uncompactCellsToMultiPolygon()? Do these functions always run validation code on the input cell sets, or are we expecting the user to do that? How do we handle errors if the user passes an unvalidated set?

src/h3lib/include/cellsToMultiPoly.h

src/h3lib/lib/cellsToMultiPoly.c

ajfriend added 4 commits December 27, 2025 18:49

cellsToMultiPolygon squashed commit

66b2267

hash multiplier fix

eafe9dc

that'll never be a problem

d05e72f

H3_EXPORT(getPentagons)

ff4790c

ajfriend added 2 commits December 27, 2025 19:07

relax tolerance a bit

5474078

can we tighten?

b81396c

ajfriend changed the title ~~cellsToMultiPolygon~~ cellsToMultiPolygon core algorithm Dec 28, 2025

ajfriend commented Dec 28, 2025

View reviewed changes

src/h3lib/lib/algos.c Outdated Show resolved Hide resolved

ajfriend added 2 commits December 28, 2025 10:15

move createGlobeMultiPolygon

50027ce

remove awkward tests. same coverage, due to testCellsToMultiPoly.c

2e7c43b

isaacbrodsky reviewed Dec 29, 2025

View reviewed changes

ajfriend added 3 commits December 29, 2025 10:16

cmp_uint64

9410416

never say never again

b984ac8

claude takes a pass

0580a3a

ajfriend requested review from dfellis, isaacbrodsky and nrabinowitz December 30, 2025 01:34

ajfriend added 4 commits January 2, 2026 17:55

handle all memory on error, and 100% line and branch coverage on changes

bca3872

lint

6f3ec2c

fix build

2db63b1

overflow check

897f5b4

ajfriend force-pushed the aj/cells_to_poly branch from 0673158 to 897f5b4 Compare January 3, 2026 23:38

tentative function signature

f0dcba0

ajfriend commented Jan 5, 2026

View reviewed changes

ajfriend added 3 commits January 5, 2026 14:46

more comments

c0d38f9

typos

7f58b4f

destroySortableLoopSetShallow

55df66d

isaacbrodsky reviewed Jan 8, 2026

View reviewed changes

src/h3lib/include/cellsToMultiPoly.h Outdated Show resolved Hide resolved

src/h3lib/lib/cellsToMultiPoly.c Outdated Show resolved Hide resolved

ajfriend mentioned this pull request Jan 8, 2026

Add cellsToMultiPolygon function #1103

Open

4 tasks

ajfriend added 3 commits January 8, 2026 17:58

Use SIZE_MAX instead of INT64_MAX

fe3d403

cancelArcPairs returns H3Error

9adc1e7

back to 100% coverage (destroySortableLoopSetShallow)

2a7a9ff

ajfriend mentioned this pull request Jan 15, 2026

Implement iterator for edges in "Gosper Island" #1114

Open

		H3Index cellsCopy = H3_MEMORY(malloc)(numCells sizeof(H3Index));
		memcpy(cellsCopy, cells, numCells * sizeof(H3Index));

cellsToMultiPolygon core algorithm #1113

Are you sure you want to change the base?

cellsToMultiPolygon core algorithm #1113

Uh oh!

Conversation

ajfriend commented Dec 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmarks

Additional speed improvements

Next steps

Uh oh!

coveralls commented Dec 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ajfriend commented Jan 5, 2026

Uh oh!

ajfriend commented Jan 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ajfriend commented Dec 28, 2025 •

edited

Loading

coveralls commented Dec 28, 2025 •

edited

Loading

ajfriend commented Jan 5, 2026 •

edited

Loading