Methodology

How AERSI Works

A complete reference for the formula, components, weights, and design decisions behind the Air Exposure Severity Index.

Master Formula

AERSI is a multiplicative composite index. Each component amplifies the others — a station with heavy pollution, frequent exceedances, and volatile swings does not score the sum of those problems, but their product.

AERSI  =  PL  ×  EPF  ×  VSF
PL
Pollution Load
0 → ∞
EPF
Exposure Persistence
1.0 → 2.0
VSF
Variability Severity
1.0 → 2.0

Pollution Load (PL)

PL

How polluted is the air today, across all pollutants combined?

Each pollutant is normalized against its WHO guideline limit, then combined with epidemiologically-derived weights.

Step 1 — Normalize

N = Observed Concentration / WHO Limit

A normalized value of 1.0 means exactly at the safe limit. A value of 4.0 means four times the danger threshold.

Step 2 — Weighted combination

PL = 0.35·N_PM2.5 + 0.25·N_PM10 + 0.15·N_NO2 + 0.15·N_Ozone + 0.10·N_SO2

Pollutant weights & WHO limits

PM2.5 15 µg/m³
0.35
PM10 45 µg/m³
0.25
NO2 25 µg/m³
0.15
Ozone 100 µg/m³
0.15
SO2 40 µg/m³
0.10
PM2.5 carries the highest weight (0.35) because fine particles penetrate deepest into the lungs and bloodstream and are most strongly associated with long-term mortality risk in epidemiological literature.

Exposure Persistence Factor (EPF)

EPF

How often is this station's air unsafe, and how confident are we?

EPF counts how many days in the rolling window the AQI exceeded 100 — the CPCB threshold between satisfactory and moderate — then applies a confidence weight based on how much data is available.

EPF = 1 + (D_exceed / W) × confidence
confidence = min(W / 30, 1.0)

With only 7 days of data, EPF is shrunk toward neutral (1.0) — we can't yet be confident a pattern has emerged. At 30 days it reaches full strength.

Days CollectedConfidenceEPF if 50% exceeded
7 days0.231.115
15 days0.501.250
30 days1.001.500

Variability Severity Factor (VSF)

VSF

How wildly does the air quality swing day to day?

Volatile air is dangerous in a specific way — people cannot predict or adapt to swings they cannot anticipate. Sudden acute spikes cause health events that consistent (even bad) air does not.

VSF = 1 + tanh(σ / 100)

The tanh function squishes any positive number into the range 0 to 1 — it grows fast initially, then flattens. This means VSF is always between 1.0 and 2.0 regardless of how extreme the volatility gets. It also depends only on the absolute swing in AQI, not on the mean — so a heavily polluted city with stable readings correctly scores VSF ≈ 1.0.

Std Dev (σ)tanh(σ/100)VSFInterpretation
00.001.00Perfectly stable, no penalty
300.291.29Mild swings
600.541.54Moderate variation
1000.761.76Large, significant swings
2000.961.96Extreme volatility, near ceiling

Baseline & Score Interpretation

The reference baseline is built into the formula itself. A station sitting exactly at WHO limits for all pollutants, with zero exceedances and zero volatility, scores:

AERSI = 1.0 × 1.0 × 1.0 = 1.0

AERSI = 1.0 is the WHO safety threshold. Every point above it represents compounding exposure beyond safe limits.

AERSICategoryMeaning
< 0.8Very LowCleaner than WHO guidelines
0.8 – 1.2LowNear the safety threshold
1.2 – 2.0ModerateConcerning cumulative exposure
2.0 – 3.0HighSignificant exposure risk
> 3.0ExtremePersistent, intense, volatile pollution

Design Principles

Each life matters equally

Population density does not modify AERSI. A remote station with extreme scores is flagged with the same severity as a dense urban one. AERSI measures exposure severity, not public health burden — those are different questions.

Honest under sparse data

The EPF confidence weight prevents overconfident conclusions in the early days of data collection. As snapshots accumulate, scores stabilize and gain full statistical weight automatically.

Bounded and interpretable

Both EPF and VSF are bounded between 1.0 and 2.0. AERSI has a natural baseline of 1.0. There is no arbitrary normalization step — the meaning is built into the math.

Improves over time

The index is explicitly designed to be early-stage. With 30+ days of consistent data, EPF and VSF gain their full voice. The pipeline runs daily and commits new data automatically.

Data Sources

Air quality data is sourced from the Central Pollution Control Board (CPCB) via the Government of India's open data platform, data.gov.in.

Resource ID: 3b01bcb8-0b14-4abf-b6f2-c1bfd384ba69
Update frequency: Daily snapshots at 10:30 AM IST
Coverage: 547 monitoring stations across India
Window: Rolling 30-day dataset

WHO guideline limits are from the WHO Global Air Quality Guidelines (2021). Pollutant weights are informed by the Global Burden of Disease study and related epidemiological literature.