H3 + Admin Boundaries: Fast Province/Country Summaries with Precomputed Lookup Tables

Dec 10, 2025

H3 + Admin Boundaries: Fast Province/Country Summaries with Precomputed Lookup Tables

H3 + Admin Boundaries: Fast “province/country” Summaries Using Precomputed Lookup Tables

If your data is stored on H3 hexagons and users ask questions like:

“Average SPI for Turkey this month?”
“Which provinces are under severe drought?”
“Show country ranking by drought exposure”

…then you’re doing summaries by administrative boundaries.

The naive approach is to run a spatial join every time:

load H3 cell geometries
load admin boundary polygons
ST_Intersects / ST_Intersection
area-weight, group, aggregate
repeat for every request, every time window

That works for a prototype, but it collapses at scale.

The fix is simple and powerful:

Build a precomputed lookup table that maps:

H3 cell → admin region (province/country)
optionally with weights for accurate boundary-crossing cells

Then every summary query becomes:

a fast join on IDs + a group by

This post shows the patterns.

All code blocks are indented to avoid Markdown-breaking nested fences.

Why spatial joins are the wrong place to spend your time

Spatial joins are expensive because:

geometry operations are heavy (especially polygons)
boundaries don’t change often (so recomputing is wasteful)
dashboard workloads repeat the same join patterns constantly

If you do the join once during a build step, the “runtime” system becomes:

numeric joins
aggregations
partition pruning by time

That is exactly what DuckDB is great at.

The model: separate three datasets

A scalable “H3 + admin summaries” system typically has:

1) H3 metrics (no geometry)

This is your time-series data product.

Columns:

h3_index (at a chosen resolution)
date (or month)
indicators like spi_3, spei_3, ndvi, etc.

Stored as Parquet, partitioned by time (and optionally resolution).

2) Admin boundaries (geometry)

A boundary dataset (country/province polygons).

Columns:

admin_id, name, level (country/province), etc.
geom polygon geometry (usually EPSG:4326)

3) Lookup table (the key)

Maps H3 cells to admin units.

Two variants:

membership lookup (fast, simpler)
weighted lookup (accurate at borders)

Lookup option A: membership mapping (fast and usually good enough)

This assigns each H3 cell to a single admin unit, often using the cell centroid.

This works well when:

your H3 resolution is not extremely fine
border precision is not critical
you want maximum speed

Build the lookup table (centroid-based)

Assume you have a table h3_cells with:

h3_index
geom (polygon boundary for the cell)

And admin with:

admin_id
name
geom

Build:

INSTALL spatial;
LOAD spatial;

CREATE OR REPLACE TABLE h3_admin_lookup AS
SELECT
  h.h3_index,
  a.admin_id,
  a.name
FROM h3_cells h
JOIN admin a
  ON ST_Contains(a.geom, ST_Centroid(h.geom));

If you have multiple admin levels, add a filter:

CREATE OR REPLACE TABLE h3_admin_lookup AS
SELECT
  h.h3_index,
  a.admin_id,
  a.name,
  a.level
FROM h3_cells h
JOIN admin a
  ON a.level = 'province'
 AND ST_Contains(a.geom, ST_Centroid(h.geom));

Runtime query becomes trivial

“Average SPI_3 per province for a month”:

SELECT
  l.admin_id,
  l.name,
  AVG(m.spi_3) AS spi3_avg
FROM h3_metrics m
JOIN h3_admin_lookup l
  ON m.h3_index = l.h3_index
WHERE m.date = DATE '2025-07-01'
GROUP BY l.admin_id, l.name
ORDER BY spi3_avg ASC;

No geometry involved. Just joins and aggregates.

Lookup option B: area-weighted mapping (accurate along borders)

Centroid assignment can misclassify border cells, especially at coarser resolutions or for thin regions/coastal boundaries.

For accuracy, compute intersection area fractions once:

w_cell_fraction = intersection_area / cell_area

Then use weights in aggregation.

Build the weighted lookup table

INSTALL spatial;
LOAD spatial;

CREATE OR REPLACE TABLE h3_admin_weights AS
SELECT
  h.h3_index,
  a.admin_id,
  a.name,
  ST_Area(ST_Intersection(h.geom, a.geom)) AS inter_area,
  ST_Area(h.geom) AS cell_area,
  ST_Area(ST_Intersection(h.geom, a.geom)) / NULLIF(ST_Area(h.geom), 0) AS w_cell_fraction
FROM h3_cells h
JOIN admin a
  ON ST_Intersects(h.geom, a.geom)
WHERE NOT ST_IsEmpty(ST_Intersection(h.geom, a.geom));

Notes:

This is heavier to build (one-time cost).
Result size is larger (a border cell can map to multiple admin units).
Runtime summaries become accurate without geometry.

Runtime query: area-weighted mean

SELECT
  w.admin_id,
  w.name,
  SUM(m.spi_3 * w.w_cell_fraction) / NULLIF(SUM(w.w_cell_fraction), 0) AS spi3_weighted
FROM h3_metrics m
JOIN h3_admin_weights w
  ON m.h3_index = w.h3_index
WHERE m.date = DATE '2025-07-01'
GROUP BY w.admin_id, w.name
ORDER BY spi3_weighted ASC;

This gives border-aware, area-weighted province metrics.

“Percent of region under drought” (classic dashboard metric)

Let’s define:

severe drought = spi_3 <= -1.5

With a weighted lookup, you can compute percent area under threshold:

WITH agg AS (
  SELECT
    w.admin_id,
    w.name,
    SUM(w.w_cell_fraction) AS total_w,
    SUM(CASE WHEN m.spi_3 <= -1.5 THEN w.w_cell_fraction ELSE 0 END) AS severe_w
  FROM h3_metrics m
  JOIN h3_admin_weights w
    ON m.h3_index = w.h3_index
  WHERE m.date = DATE '2025-07-01'
  GROUP BY w.admin_id, w.name
)
SELECT
  admin_id,
  name,
  (severe_w / NULLIF(total_w, 0)) * 100.0 AS severe_pct
FROM agg
ORDER BY severe_pct DESC;

This is fast and scales well because it avoids geometry entirely at runtime.

How to make it fast in practice (build-time and runtime)

Build-time tips

Ensure admin boundaries are in a suitable CRS for area calculations (meters-based CRS is better for accurate areas).
Simplify admin boundaries if they’re extremely detailed (but keep a high-fidelity source).
Consider limiting to relevant admin levels (country/province) to reduce join complexity.

Runtime tips

Keep H3 metrics partitioned by time:
- WHERE date = ... should prune to a small set of files
Keep lookup tables small and cached:
- store h3_admin_lookup and h3_admin_weights as Parquet
Avoid geometry queries in the hot path:
- no ST_Intersects in API endpoints if you can avoid it

Where H3 resolution matters

Border complexity grows at coarse resolutions:

fewer cells → each cell covers more area → more border straddling
weighted lookup becomes more important

At very fine resolutions:

centroid mapping is often sufficient (cells are small)
lookup tables get larger, but metrics become more precise

The best approach is to choose a resolution that matches your product:

global patterns: coarser resolution + weighted lookups
local drilldown: finer resolution + centroid mapping often acceptable

Publishing the lookup tables as a data product

Store your lookup tables as Parquet so they can be reused everywhere:

DuckDB local
server-side DuckDB
DuckDB WASM in-browser
Python/Polars/Arrow engines

Example exports:

COPY h3_admin_lookup
TO 'output/lookups/h3_admin_lookup.parquet'
(FORMAT PARQUET);

COPY h3_admin_weights
TO 'output/lookups/h3_admin_weights.parquet'
(FORMAT PARQUET);

Version them alongside your metrics:

output/
  v2026-01-27/
    metrics/
      res=6/year=2025/month=07/part-0000.parquet
    lookups/
      h3_admin_lookup.parquet
      h3_admin_weights.parquet
    admin/
      provinces.parquet

This makes your system reproducible and cache-friendly.

Common pitfalls

Pitfall 1: CRS confusion during area weighting

Area depends on CRS. Doing ST_Area in EPSG:4326 can be misleading. If accuracy matters, project to a meter-based CRS during the build step.

Pitfall 2: Coastal and island boundaries

Some H3 cells may include ocean areas, which can affect weighting if your cell geometry covers sea. Decide whether you want land-only fractions (requires land mask) or accept the simplification.

Pitfall 3: Admin boundary changes

Admin boundaries can change over time. If your product spans decades, keep boundary versions or document the boundary snapshot date.

Pitfall 4: Multi-level summaries

If you want both country and province, build lookups per level:

h3_country_lookup
h3_province_lookup
or a single table with a level column

Summary

If you want fast province/country summaries over H3 data:

don’t run spatial joins on every request
build lookup tables once:
- centroid membership for speed
- intersection-area weights for accuracy
keep metrics partitioned by time and resolution
publish lookups as Parquet “data products” alongside metrics

This architecture turns expensive geometry work into a one-time build cost, and makes your dashboards feel instant.