H3 + Admin Boundaries: Fast Province/Country Summaries with Precomputed Lookup Tables


H3 + Admin Boundaries: Fast “province/country” Summaries Using Precomputed Lookup Tables

If your data is stored on H3 hexagons and users ask questions like:

  • “Average SPI for Turkey this month?”
  • “Which provinces are under severe drought?”
  • “Show country ranking by drought exposure”

…then you’re doing summaries by administrative boundaries.

The naive approach is to run a spatial join every time:

  • load H3 cell geometries
  • load admin boundary polygons
  • ST_Intersects / ST_Intersection
  • area-weight, group, aggregate
  • repeat for every request, every time window

That works for a prototype, but it collapses at scale.

The fix is simple and powerful:

Build a precomputed lookup table that maps:

  • H3 cell → admin region (province/country)
  • optionally with weights for accurate boundary-crossing cells

Then every summary query becomes:

  • a fast join on IDs + a group by

This post shows the patterns.

All code blocks are indented to avoid Markdown-breaking nested fences.


Why spatial joins are the wrong place to spend your time

Spatial joins are expensive because:

  • geometry operations are heavy (especially polygons)
  • boundaries don’t change often (so recomputing is wasteful)
  • dashboard workloads repeat the same join patterns constantly

If you do the join once during a build step, the “runtime” system becomes:

  • numeric joins
  • aggregations
  • partition pruning by time

That is exactly what DuckDB is great at.


The model: separate three datasets

A scalable “H3 + admin summaries” system typically has:

1) H3 metrics (no geometry)

This is your time-series data product.

Columns:

  • h3_index (at a chosen resolution)
  • date (or month)
  • indicators like spi_3, spei_3, ndvi, etc.

Stored as Parquet, partitioned by time (and optionally resolution).

2) Admin boundaries (geometry)

A boundary dataset (country/province polygons).

Columns:

  • admin_id, name, level (country/province), etc.
  • geom polygon geometry (usually EPSG:4326)
3) Lookup table (the key)

Maps H3 cells to admin units.

Two variants:

  • membership lookup (fast, simpler)
  • weighted lookup (accurate at borders)

Lookup option A: membership mapping (fast and usually good enough)

This assigns each H3 cell to a single admin unit, often using the cell centroid.

This works well when:

  • your H3 resolution is not extremely fine
  • border precision is not critical
  • you want maximum speed
Build the lookup table (centroid-based)

Assume you have a table h3_cells with:

  • h3_index
  • geom (polygon boundary for the cell)

And admin with:

  • admin_id
  • name
  • geom

Build:

INSTALL spatial;
LOAD spatial;

CREATE OR REPLACE TABLE h3_admin_lookup AS
SELECT
  h.h3_index,
  a.admin_id,
  a.name
FROM h3_cells h
JOIN admin a
  ON ST_Contains(a.geom, ST_Centroid(h.geom));

If you have multiple admin levels, add a filter:

CREATE OR REPLACE TABLE h3_admin_lookup AS
SELECT
  h.h3_index,
  a.admin_id,
  a.name,
  a.level
FROM h3_cells h
JOIN admin a
  ON a.level = 'province'
 AND ST_Contains(a.geom, ST_Centroid(h.geom));
Runtime query becomes trivial

“Average SPI_3 per province for a month”:

SELECT
  l.admin_id,
  l.name,
  AVG(m.spi_3) AS spi3_avg
FROM h3_metrics m
JOIN h3_admin_lookup l
  ON m.h3_index = l.h3_index
WHERE m.date = DATE '2025-07-01'
GROUP BY l.admin_id, l.name
ORDER BY spi3_avg ASC;

No geometry involved. Just joins and aggregates.


Lookup option B: area-weighted mapping (accurate along borders)

Centroid assignment can misclassify border cells, especially at coarser resolutions or for thin regions/coastal boundaries.

For accuracy, compute intersection area fractions once:

  • w_cell_fraction = intersection_area / cell_area

Then use weights in aggregation.

Build the weighted lookup table
INSTALL spatial;
LOAD spatial;

CREATE OR REPLACE TABLE h3_admin_weights AS
SELECT
  h.h3_index,
  a.admin_id,
  a.name,
  ST_Area(ST_Intersection(h.geom, a.geom)) AS inter_area,
  ST_Area(h.geom) AS cell_area,
  ST_Area(ST_Intersection(h.geom, a.geom)) / NULLIF(ST_Area(h.geom), 0) AS w_cell_fraction
FROM h3_cells h
JOIN admin a
  ON ST_Intersects(h.geom, a.geom)
WHERE NOT ST_IsEmpty(ST_Intersection(h.geom, a.geom));

Notes:

  • This is heavier to build (one-time cost).
  • Result size is larger (a border cell can map to multiple admin units).
  • Runtime summaries become accurate without geometry.
Runtime query: area-weighted mean
SELECT
  w.admin_id,
  w.name,
  SUM(m.spi_3 * w.w_cell_fraction) / NULLIF(SUM(w.w_cell_fraction), 0) AS spi3_weighted
FROM h3_metrics m
JOIN h3_admin_weights w
  ON m.h3_index = w.h3_index
WHERE m.date = DATE '2025-07-01'
GROUP BY w.admin_id, w.name
ORDER BY spi3_weighted ASC;

This gives border-aware, area-weighted province metrics.


“Percent of region under drought” (classic dashboard metric)

Let’s define:

  • severe drought = spi_3 <= -1.5

With a weighted lookup, you can compute percent area under threshold:

WITH agg AS (
  SELECT
    w.admin_id,
    w.name,
    SUM(w.w_cell_fraction) AS total_w,
    SUM(CASE WHEN m.spi_3 <= -1.5 THEN w.w_cell_fraction ELSE 0 END) AS severe_w
  FROM h3_metrics m
  JOIN h3_admin_weights w
    ON m.h3_index = w.h3_index
  WHERE m.date = DATE '2025-07-01'
  GROUP BY w.admin_id, w.name
)
SELECT
  admin_id,
  name,
  (severe_w / NULLIF(total_w, 0)) * 100.0 AS severe_pct
FROM agg
ORDER BY severe_pct DESC;

This is fast and scales well because it avoids geometry entirely at runtime.


How to make it fast in practice (build-time and runtime)

Build-time tips
  • Ensure admin boundaries are in a suitable CRS for area calculations (meters-based CRS is better for accurate areas).
  • Simplify admin boundaries if they’re extremely detailed (but keep a high-fidelity source).
  • Consider limiting to relevant admin levels (country/province) to reduce join complexity.
Runtime tips
  • Keep H3 metrics partitioned by time:
    • WHERE date = ... should prune to a small set of files
  • Keep lookup tables small and cached:
    • store h3_admin_lookup and h3_admin_weights as Parquet
  • Avoid geometry queries in the hot path:
    • no ST_Intersects in API endpoints if you can avoid it

Where H3 resolution matters

Border complexity grows at coarse resolutions:

  • fewer cells → each cell covers more area → more border straddling
  • weighted lookup becomes more important

At very fine resolutions:

  • centroid mapping is often sufficient (cells are small)
  • lookup tables get larger, but metrics become more precise

The best approach is to choose a resolution that matches your product:

  • global patterns: coarser resolution + weighted lookups
  • local drilldown: finer resolution + centroid mapping often acceptable

Publishing the lookup tables as a data product

Store your lookup tables as Parquet so they can be reused everywhere:

  • DuckDB local
  • server-side DuckDB
  • DuckDB WASM in-browser
  • Python/Polars/Arrow engines

Example exports:

COPY h3_admin_lookup
TO 'output/lookups/h3_admin_lookup.parquet'
(FORMAT PARQUET);

COPY h3_admin_weights
TO 'output/lookups/h3_admin_weights.parquet'
(FORMAT PARQUET);

Version them alongside your metrics:

output/
  v2026-01-27/
    metrics/
      res=6/year=2025/month=07/part-0000.parquet
    lookups/
      h3_admin_lookup.parquet
      h3_admin_weights.parquet
    admin/
      provinces.parquet

This makes your system reproducible and cache-friendly.


Common pitfalls

Pitfall 1: CRS confusion during area weighting

Area depends on CRS. Doing ST_Area in EPSG:4326 can be misleading. If accuracy matters, project to a meter-based CRS during the build step.

Pitfall 2: Coastal and island boundaries

Some H3 cells may include ocean areas, which can affect weighting if your cell geometry covers sea. Decide whether you want land-only fractions (requires land mask) or accept the simplification.

Pitfall 3: Admin boundary changes

Admin boundaries can change over time. If your product spans decades, keep boundary versions or document the boundary snapshot date.

Pitfall 4: Multi-level summaries

If you want both country and province, build lookups per level:

  • h3_country_lookup
  • h3_province_lookup
  • or a single table with a level column

Summary

If you want fast province/country summaries over H3 data:

  • don’t run spatial joins on every request
  • build lookup tables once:
    • centroid membership for speed
    • intersection-area weights for accuracy
  • keep metrics partitioned by time and resolution
  • publish lookups as Parquet “data products” alongside metrics

This architecture turns expensive geometry work into a one-time build cost, and makes your dashboards feel instant.