A complete reference for the AERSI formula — covering pollution load, persistence, variability, and data confidence. Designed to be scientifically honest about what the data can and cannot tell us.
AERSI uses a geometric weighted mean of three components — pollution burden, persistence, and variability — each raised to a calibrated exponent. A station that performs badly across multiple dimensions scores more severely than one that is bad in only one.
How polluted is the air, across all pollutants combined?
Each pollutant present is normalized against its WHO 2021 guideline limit, soft-saturated to prevent single extreme readings from dominating, then combined using weights renormalized over only the pollutants actually available.
N_p = 1.0 means exactly at the WHO safe limit. N_p = 4.0 means four times the danger threshold.
The 0.6 exponent approximates the concave concentration-response relationship documented for PM2.5 in global epidemiological cohort studies, where marginal health risk decreases at higher concentrations (Pope & Dockery, 2006; GBD Integrated Exposure-Response framework). This value is informed by published sublinear dose-response shapes but is treated as heuristic in this version. Sensitivity analyses across exponents 0.4 to 0.8 confirm that station rankings and severity category assignments are robust to this choice.
If some pollutants are missing, weights are renormalized across only the pollutants present:
How often has this station's air been unsafe?
EPF counts how many of the 30 rolling days had AQI above 100 — the CPCB threshold between satisfactory and moderate. A single honest dampening term scales the persistence by the square root of data coverage, preventing overconfident scores from sparse stations.
Using square root rather than linear dampening means a station with 15 days of data gets weight 0.707 — not the 0.25 that a quadratic term would produce. Proportional, not punishing.
The primary EPF threshold is AQI > 100, the regulatory boundary between satisfactory and unhealthy air quality under the CPCB framework — the same threshold used in national public health advisories. Sensitivity analyses using WHO PM2.5 guideline (15 µg/m³) and interim target (35 µg/m³) as concentration-based thresholds produce consistent station rankings.
| Days Observed | data_weight | EPF if 80% exceeded | EPF if 30% exceeded |
|---|---|---|---|
| 7 days | 0.483 | 1.387 | 1.145 |
| 15 days | 0.707 | 1.566 | 1.212 |
| 22 days | 0.856 | 1.685 | 1.257 |
| 30 days | 1.000 | 1.800 | 1.300 |
How unpredictably does air quality swing day to day?
Volatile air is dangerous in a specific way — acute spikes cause cardiovascular events, and people cannot adapt to swings they cannot predict. VSF uses the median absolute day-to-day AQI change instead of standard deviation, which makes it robust to single sensor spikes or one unusual event.
Median is used instead of mean to ensure robustness against single-day sensor anomalies or recording errors — a common occurrence in real-world monitoring networks. Epidemiological time-series and case-crossover studies demonstrate associations between short-term concentration fluctuations and acute cardiovascular and respiratory events, providing the health basis for including a volatility dimension.
The tanh function keeps VSF bounded between 1.0 and 2.0 regardless of how extreme the swings become. The constant 45 means a median daily swing of 45 AQI points gives tanh(1) ≈ 0.76, placing VSF at 1.76.
| Median Daily Swing (S) | tanh(S/45) | VSF | Interpretation |
|---|---|---|---|
| 0 | 0.00 | 1.00 | Perfectly stable |
| 15 | 0.32 | 1.32 | Mild day-to-day change |
| 30 | 0.58 | 1.58 | Moderate swings |
| 45 | 0.76 | 1.76 | Large, significant swings |
| 80 | 0.94 | 1.94 | Extreme daily volatility |
How much can we trust this station's score?
A station with incomplete pollutant reporting or short history should not be presented with the same confidence as a fully observed one. CF_data is computed for every station and used to generate a confidence label shown alongside the score — it does not multiply into the AERSI score itself.
CF_data is stored alongside every station score and mapped to one of four confidence labels. This makes data quality visible to the reader without suppressing the score for stations in regions with limited sensor infrastructure.
The reference baseline is built into the formula. A station at exactly WHO limits on all five pollutants, with zero exceedances and zero volatility, scores AERSI = 1.0.
| AERSI | Category | Meaning |
|---|---|---|
| < 0.6 | Very Low | Cleaner than WHO guidelines across all dimensions |
| 0.6 – 1.0 | Low | Near the safety threshold, mostly acceptable |
| 1.0 – 1.5 | Moderate | Regular exceedance — concerning for sensitive groups |
| 1.5 – 2.0 | High | Persistent exposure risk for general population |
| > 2.0 | Extreme | Severe, persistent, volatile — among the worst globally |
Incomplete data is flagged rather than silently ignored. A station with two pollutants and 15 days of history receives a lower confidence label than a fully observed one — making data quality visible to the reader without suppressing the score itself.
Indian monitoring data regularly has sensor downtime, missing pollutants, and station outages. Every design decision — weight renormalization, sqrt dampening, median absolute change — was chosen to degrade gracefully under these conditions rather than catastrophically.
Population density does not modify AERSI. A remote industrial station with extreme scores is flagged identically to a dense urban one. AERSI measures exposure severity per person — not total public health burden.
EPF is bounded 1.0–2.0. VSF is bounded 1.0–2.0. AERSI has a natural baseline of 1.0. No arbitrary normalization step distorts the output — the meaning is built into the mathematics.
As data accumulates, EPF reaches full confidence and VSF stabilizes on a longer sequence of day-to-day changes. The pipeline runs daily and scores become more trustworthy automatically.
(1) Pollutant weights for NO2, SO2, and PM10 are estimated from global comparative-risk literature rather than India-specific GBD attributable burden figures, which are not yet separately published for these pollutants at the national level. (2) The 0.6 saturation exponent is heuristic and not fitted to local health outcome data. (3) EPF and VSF are operationalised using AQI rather than raw concentration series, introducing a partial dependency on AQI's structural choices. (4) CF_quality is set to 1.0 pending sensor metadata availability. These limitations are explicitly acknowledged and will be addressed in subsequent versions through empirical calibration against health outcome data.
AERSI is designed as a complement to standard AQI, not a replacement. AQI answers: how bad is the air today? AERSI answers: how severe, persistent, and volatile has the exposure been over the past 30 days? These are complementary analytical questions.
AERSI uses AQI as input for EPF and VSF because AQI is the standardised national multi-pollutant summary — this makes AERSI directly comparable with existing public health alert frameworks while adding the persistence and volatility dimensions that AQI structurally cannot capture in a single daily reading.
Air quality data is sourced from the Central Pollution Control Board (CPCB) via the Government of India's open data platform, data.gov.in.
WHO guideline limits are from the WHO Global Air Quality Guidelines (2021). Pollutant weights for PM2.5 and Ozone are derived from India-specific attributable DALYs in the Global Burden of Disease Study 2019 (India State-Level Disease Burden Initiative, Lancet Planetary Health, 2021). Weights for PM10, NO2, and SO2 are estimated from global comparative-risk exposure-response literature and renormalized. The soft-saturation exponent of 0.6 is consistent with sublinear PM2.5 exposure-response relationships documented in peer-reviewed health impact assessments (Pope & Dockery, 2006).