Global Drought Map is Live: Serverless, Cloud-Native Drought Analytics with H3 + Parquet + DuckDB WASM

Oct 17, 2025

Global Drought Map is Live: Serverless, Cloud-Native Drought Analytics with H3 + Parquet + DuckDB WASM

Global Drought Map is now live

Today I’m excited to share that Global Drought Map is live.

This started as a side project, but it’s rooted in real work: drought consulting projects I’ve been involved in across Madagascar, Somalia, and Turkey. Those projects always raise the same practical questions:

How do we make drought indicators accessible beyond GIS specialists?
How do we ship a map that stays cheap to operate, even if usage spikes?
How do we keep the data pipeline reproducible and the outputs portable?

So I used Global Drought Map as a playground to test a modern “cloud-native + local-first” approach: preprocessed drought indices distributed globally on Uber’s H3 grid, stored as Parquet on Cloudflare R2, and queried directly in the browser via DuckDB WASM — without a traditional backend service.

The app may not be super fast yet, but prioritising sustainability, simplicity, and reproducibility over a complex always-on backend was the point.

What the app shows

The app visualises drought conditions using two widely used indices:

SPI (Standardized Precipitation Index)
SPEI (Standardized Precipitation Evapotranspiration Index)

Both are computed from ERA5 and then distributed across H3 hexagons, spanning 1980 to today.

In practical terms, that means you can:

pick a date (and, depending on the UI, a time scale / accumulation window),
view global drought patterns on a consistent hex grid,
query and aggregate drought values interactively, in-browser.

Why H3 for global drought?

Global datasets are awkward at human scales:

pixels are too granular for exploration (and heavy for web maps),
admin boundaries are politically defined and inconsistent in coverage/resolution,
point stations have uneven spatial density.

H3 is a good “analysis grid” compromise:

global, consistent indexing (cell IDs are stable and hierarchical),
multi-resolution (zoom-level ↔ resolution mapping),
fast aggregation (group by H3 index prefixes / parent-child relationships),
web-friendly (hexagons look intuitive and aggregate well).

Even more importantly, H3 makes data distribution and retrieval easier: you can store one row per (h3_index, date) (or (h3_index, month) etc.), partition it, and retrieve only what’s needed.

Data architecture: ERA5 → indices → H3 → Parquet

At a high level, the pipeline looks like this:

ERA5 ingestion
Compute SPI/SPEI
Map results onto an H3 grid
Write cloud-optimised Parquet
Host on Cloudflare R2
Query in browser with DuckDB WASM

Step 1 — ERA5 ingestion

ERA5 is attractive for global drought monitoring because it’s:

global coverage
consistent methodology
long historical record (useful for standardisation)

The key is being disciplined about time coverage and baseline windows when computing standardised indices. If you change the baseline, you change the “meaning” of SPI/SPEI values.

Step 2 — SPI and SPEI computation (what “standardised” really means)

Both SPI and SPEI are designed to map climate anomalies to a standard normal distribution, so results are comparable across space:

SPI = z-score(precip anomaly under a fitted distribution)
SPEI = z-score(balance anomaly of (P − PET) under a fitted distribution)

The important detail: this is not a naive (x − mean) / std on raw precipitation. SPI and SPEI typically involve:

selecting an accumulation window (e.g., 1, 3, 6, 12 months),
fitting a probability distribution per location for that timescale,
transforming the cumulative probability to a normal deviate (z-score).

That’s what makes “-2” mean “extreme drought” in a statistically consistent sense.

If you’re designing a web product, the take-away is:

precompute as much as possible,
keep the index definition stable (baseline and fitting assumptions),
expose the interpretation in the UI (e.g., mild/moderate/severe/extreme).

Step 3 — distributing onto H3

Once you have a gridded field (or values you can sample/interpolate), the job is to map each value to an H3 cell at a chosen resolution.

You can do this in different ways depending on the source representation:

point-sample each H3 centroid from the gridded field,
area-weighted aggregation from pixels into hexes,
build a consistent H3 lookup table and compute values aligned with it.

This step is where you decide the product’s spatial resolution. Higher H3 resolutions mean:

more rows,
bigger Parquet datasets,
heavier client-side queries,

…but also better local detail.

Storage format: why Parquet (and not GeoJSON, tiles, or a DB)?

Parquet is the key to making this backendless approach feasible.

Compared to GeoJSON:

Parquet is columnar (faster scans for analytics)
it compresses well
it’s designed for “read subsets of columns” workloads
it plays nicely with analytical engines like DuckDB

Compared to a database:

there’s no server to maintain
scaling is pushed to object storage + client compute
the “database” becomes immutable, versionable files

Compared to vector tiles:

tiles are great for rendering, but awkward for ad-hoc analytics
Parquet lets you filter, aggregate, compute percentiles, etc.

The result is a data lake style workflow, but served directly to the browser.

Hosting: Cloudflare R2 as the “data origin”

Cloudflare R2 is used here as object storage for the Parquet files. The important operational properties (for this kind of architecture) are:

“dumb” storage: files in, files out
cheap to operate compared to always-on services
integrates well with modern static hosting / edge delivery patterns

The app can be deployed as a static site, and the data can be served as objects. No API server needed.

Query engine: DuckDB WASM in the browser

This is the most fun part.

DuckDB-WASM brings a real analytical database engine into the client runtime. That means your browser can:

download Parquet data,
run SQL locally,
aggregate and filter interactively,
return only the results required for the current view.

What “no backend” actually means here

“No traditional backend service” doesn’t mean “no infrastructure”. It means the system is shaped like this:

Static app (HTML/JS/CSS)
Object storage (Parquet files)
Client-side SQL engine (DuckDB WASM)
Optional CDN/edge behaviours (caching, range requests, etc.)

There’s no always-on server process doing queries. The query happens where the user is.

Example: selecting drought values for a date window

Below is an illustrative DuckDB-style query pattern you might run (exact schema may differ):

-- Load Parquet partition(s) relevant to the requested time window
SELECT
  h3_index,
  avg(spi_3) AS spi3
FROM read_parquet('https://<r2-bucket>/<path>/spi_spei/*.parquet')
WHERE date BETWEEN DATE '2020-01-01' AND DATE '2020-01-31'
GROUP BY h3_index;

Example: compute drought category counts (good for legends & summaries)

WITH d AS (
  SELECT h3_index, spi_3
  FROM read_parquet('https://<r2-bucket>/<path>/spi_spei/*.parquet')
  WHERE date = DATE '2022-08-01'
)
SELECT
  CASE
    WHEN spi_3 <= -2.0 THEN 'Extreme'
    WHEN spi_3 <= -1.5 THEN 'Severe'
    WHEN spi_3 <= -1.0 THEN 'Moderate'
    WHEN spi_3 <= -0.5 THEN 'Mild'
    ELSE 'Normal/Wet'
  END AS category,
  count(*) AS cells
FROM d
GROUP BY category
ORDER BY cells DESC;

Example: roll up from a finer H3 resolution to a coarser one

If your dataset is stored at a high H3 resolution but you want faster rendering at zoomed-out views, you can roll up:

SELECT
  h3_to_parent(h3_index, 3) AS h3_parent,
  avg(spi_3) AS spi3
FROM drought
WHERE date = DATE '2021-07-01'
GROUP BY h3_parent;

(Where h3_to_parent is provided by an H3 extension / function set in your environment.)

Performance reality: why it’s not “super fast” yet

A browser doing real analytics is powerful, but performance depends on a few things:

1) File layout and partitioning

Parquet works best when you avoid forcing the client to scan “everything”.

Typical strategies:

partition by year / month folders,
keep row groups aligned to common filters (time is the most common here),
store only the columns you need for the visualisation.

2) Balancing spatial resolution vs. interactivity

Global H3 at high resolution × monthly history over decades gets big quickly.

A practical pattern is:

store one “analysis” resolution for deep dives,
precompute coarser summary datasets for global views,
choose resolution based on zoom level.

3) Client device variability

Some users have fast laptops; others have limited memory / CPU. With client-side SQL, your “compute fleet” is the user base — which is great for scaling costs, but variable in speed.

Reliability and correctness: “trust your index”

When you publish drought indices, correctness and transparency matter.

Things I treat as non-negotiable:

consistent baseline definition
clear index interpretation in the UI
stable versioning of datasets (so results don’t silently change)
the ability to reproduce outputs from source data and code

A “serverless” architecture doesn’t remove the need for data discipline — it increases it, because users may cache data locally and compare outputs across time.

Why I built it this way (personal + practical)

This project was inspired by drought work I’ve done in Madagascar, Somalia, and Turkey, where sustainability matters:

budgets are limited,
infrastructure can be fragile,
operational simplicity is a feature.

A static site + object storage + in-browser analytics is not the answer to every problem, but it’s a compelling architecture when you want:

low operational burden,
elastic usage,
and a reproducible, portable data product.

Feedback welcome (and yes, there may be bugs)

This is live, and there might still be rough edges.

If you try it and notice:

slow interactions,
confusing legends or thresholds,
unexpected gaps in time or coverage,
rendering glitches at certain zoom levels,

…please send feedback. Real-world usage reports are the fastest way to make it better.

What’s next

Some obvious next steps (without over-promising):

better partitioning + caching strategies for “first load” speed
multi-resolution datasets for smoother zooming
richer summaries (area affected, population exposure overlays, etc.)
clearer explainers around SPI/SPEI interpretation in the UI