H3 + Admin Boundaries: Fast Province/Country Summaries with Precomputed Lookup Tables
H3 + Admin Boundaries: Fast “province/country” Summaries Using Precomputed Lookup Tables
If your data is stored on H3 hexagons and users ask questions like:
- “Average SPI for Turkey this month?”
- “Which provinces are under severe drought?”
- “Show country ranking by drought exposure”
…then you’re doing summaries by administrative boundaries.
The naive approach is to run a spatial join every time:
- load H3 cell geometries
- load admin boundary polygons
ST_Intersects/ST_Intersection- area-weight, group, aggregate
- repeat for every request, every time window
That works for a prototype, but it collapses at scale.
The fix is simple and powerful:
Build a precomputed lookup table that maps:
- H3 cell → admin region (province/country)
- optionally with weights for accurate boundary-crossing cells
Then every summary query becomes:
- a fast join on IDs + a group by
This post shows the patterns.
All code blocks are indented to avoid Markdown-breaking nested fences.
Why spatial joins are the wrong place to spend your time
Spatial joins are expensive because:
- geometry operations are heavy (especially polygons)
- boundaries don’t change often (so recomputing is wasteful)
- dashboard workloads repeat the same join patterns constantly
If you do the join once during a build step, the “runtime” system becomes:
- numeric joins
- aggregations
- partition pruning by time
That is exactly what DuckDB is great at.
The model: separate three datasets
A scalable “H3 + admin summaries” system typically has:
1) H3 metrics (no geometry)
This is your time-series data product.
Columns:
h3_index(at a chosen resolution)date(ormonth)- indicators like
spi_3,spei_3,ndvi, etc.
Stored as Parquet, partitioned by time (and optionally resolution).
2) Admin boundaries (geometry)
A boundary dataset (country/province polygons).
Columns:
admin_id,name,level(country/province), etc.geompolygon geometry (usually EPSG:4326)
3) Lookup table (the key)
Maps H3 cells to admin units.
Two variants:
- membership lookup (fast, simpler)
- weighted lookup (accurate at borders)
Lookup option A: membership mapping (fast and usually good enough)
This assigns each H3 cell to a single admin unit, often using the cell centroid.
This works well when:
- your H3 resolution is not extremely fine
- border precision is not critical
- you want maximum speed
Build the lookup table (centroid-based)
Assume you have a table h3_cells with:
h3_indexgeom(polygon boundary for the cell)
And admin with:
admin_idnamegeom
Build:
INSTALL spatial;
LOAD spatial;
CREATE OR REPLACE TABLE h3_admin_lookup AS
SELECT
h.h3_index,
a.admin_id,
a.name
FROM h3_cells h
JOIN admin a
ON ST_Contains(a.geom, ST_Centroid(h.geom));
If you have multiple admin levels, add a filter:
CREATE OR REPLACE TABLE h3_admin_lookup AS
SELECT
h.h3_index,
a.admin_id,
a.name,
a.level
FROM h3_cells h
JOIN admin a
ON a.level = 'province'
AND ST_Contains(a.geom, ST_Centroid(h.geom));
Runtime query becomes trivial
“Average SPI_3 per province for a month”:
SELECT
l.admin_id,
l.name,
AVG(m.spi_3) AS spi3_avg
FROM h3_metrics m
JOIN h3_admin_lookup l
ON m.h3_index = l.h3_index
WHERE m.date = DATE '2025-07-01'
GROUP BY l.admin_id, l.name
ORDER BY spi3_avg ASC;
No geometry involved. Just joins and aggregates.
Lookup option B: area-weighted mapping (accurate along borders)
Centroid assignment can misclassify border cells, especially at coarser resolutions or for thin regions/coastal boundaries.
For accuracy, compute intersection area fractions once:
w_cell_fraction= intersection_area / cell_area
Then use weights in aggregation.
Build the weighted lookup table
INSTALL spatial;
LOAD spatial;
CREATE OR REPLACE TABLE h3_admin_weights AS
SELECT
h.h3_index,
a.admin_id,
a.name,
ST_Area(ST_Intersection(h.geom, a.geom)) AS inter_area,
ST_Area(h.geom) AS cell_area,
ST_Area(ST_Intersection(h.geom, a.geom)) / NULLIF(ST_Area(h.geom), 0) AS w_cell_fraction
FROM h3_cells h
JOIN admin a
ON ST_Intersects(h.geom, a.geom)
WHERE NOT ST_IsEmpty(ST_Intersection(h.geom, a.geom));
Notes:
- This is heavier to build (one-time cost).
- Result size is larger (a border cell can map to multiple admin units).
- Runtime summaries become accurate without geometry.
Runtime query: area-weighted mean
SELECT
w.admin_id,
w.name,
SUM(m.spi_3 * w.w_cell_fraction) / NULLIF(SUM(w.w_cell_fraction), 0) AS spi3_weighted
FROM h3_metrics m
JOIN h3_admin_weights w
ON m.h3_index = w.h3_index
WHERE m.date = DATE '2025-07-01'
GROUP BY w.admin_id, w.name
ORDER BY spi3_weighted ASC;
This gives border-aware, area-weighted province metrics.
“Percent of region under drought” (classic dashboard metric)
Let’s define:
- severe drought =
spi_3 <= -1.5
With a weighted lookup, you can compute percent area under threshold:
WITH agg AS (
SELECT
w.admin_id,
w.name,
SUM(w.w_cell_fraction) AS total_w,
SUM(CASE WHEN m.spi_3 <= -1.5 THEN w.w_cell_fraction ELSE 0 END) AS severe_w
FROM h3_metrics m
JOIN h3_admin_weights w
ON m.h3_index = w.h3_index
WHERE m.date = DATE '2025-07-01'
GROUP BY w.admin_id, w.name
)
SELECT
admin_id,
name,
(severe_w / NULLIF(total_w, 0)) * 100.0 AS severe_pct
FROM agg
ORDER BY severe_pct DESC;
This is fast and scales well because it avoids geometry entirely at runtime.
How to make it fast in practice (build-time and runtime)
Build-time tips
- Ensure admin boundaries are in a suitable CRS for area calculations (meters-based CRS is better for accurate areas).
- Simplify admin boundaries if they’re extremely detailed (but keep a high-fidelity source).
- Consider limiting to relevant admin levels (country/province) to reduce join complexity.
Runtime tips
- Keep H3 metrics partitioned by time:
WHERE date = ...should prune to a small set of files
- Keep lookup tables small and cached:
- store
h3_admin_lookupandh3_admin_weightsas Parquet
- store
- Avoid geometry queries in the hot path:
- no
ST_Intersectsin API endpoints if you can avoid it
- no
Where H3 resolution matters
Border complexity grows at coarse resolutions:
- fewer cells → each cell covers more area → more border straddling
- weighted lookup becomes more important
At very fine resolutions:
- centroid mapping is often sufficient (cells are small)
- lookup tables get larger, but metrics become more precise
The best approach is to choose a resolution that matches your product:
- global patterns: coarser resolution + weighted lookups
- local drilldown: finer resolution + centroid mapping often acceptable
Publishing the lookup tables as a data product
Store your lookup tables as Parquet so they can be reused everywhere:
- DuckDB local
- server-side DuckDB
- DuckDB WASM in-browser
- Python/Polars/Arrow engines
Example exports:
COPY h3_admin_lookup
TO 'output/lookups/h3_admin_lookup.parquet'
(FORMAT PARQUET);
COPY h3_admin_weights
TO 'output/lookups/h3_admin_weights.parquet'
(FORMAT PARQUET);
Version them alongside your metrics:
output/
v2026-01-27/
metrics/
res=6/year=2025/month=07/part-0000.parquet
lookups/
h3_admin_lookup.parquet
h3_admin_weights.parquet
admin/
provinces.parquet
This makes your system reproducible and cache-friendly.
Common pitfalls
Pitfall 1: CRS confusion during area weighting
Area depends on CRS. Doing ST_Area in EPSG:4326 can be misleading.
If accuracy matters, project to a meter-based CRS during the build step.
Pitfall 2: Coastal and island boundaries
Some H3 cells may include ocean areas, which can affect weighting if your cell geometry covers sea. Decide whether you want land-only fractions (requires land mask) or accept the simplification.
Pitfall 3: Admin boundary changes
Admin boundaries can change over time. If your product spans decades, keep boundary versions or document the boundary snapshot date.
Pitfall 4: Multi-level summaries
If you want both country and province, build lookups per level:
h3_country_lookuph3_province_lookup- or a single table with a
levelcolumn
Summary
If you want fast province/country summaries over H3 data:
- don’t run spatial joins on every request
- build lookup tables once:
- centroid membership for speed
- intersection-area weights for accuracy
- keep metrics partitioned by time and resolution
- publish lookups as Parquet “data products” alongside metrics
This architecture turns expensive geometry work into a one-time build cost, and makes your dashboards feel instant.