
Machine-learning-based evidence and attribution mapping of 100000 climate impact studies -


Increasing evidence suggests that climate change impacts are already observed around the world. Global environmental assessments face challenges to appraise the growing literature. Here we use the language model BERT to identify and classify studies on observed climate impacts, producing a comprehensive machine-learning-assisted evidence map. We estimate that 102,160 (64,958–164,274) publications document a broad range of observed impacts. By combining our spatially resolved database with grid-cell-level human-attributable changes in temperature and precipitation, we infer that attributable anthropogenic impacts may be occurring across 80% of the world’s land area, where 85% of the population reside. Our results reveal a substantial ‘attribution gap’ as robust levels of evidence for potentially attributable impacts are twice as prevalent in high-income than in low-income countries. While gaps remain on confidently attributabing climate impacts at the regional and sectoral level, this database illustrates the potential current impact of anthropogenic climate change across the globe.

Fig. 1: Results of the machine-assisted literature review.
Fig. 2: Potential attribution of impact studies to regional anthropogenic temperature and precipitation trends.
Fig. 3: A global density map of climate impact evidence.

Data availability

The results of this study are made available in a public repository58.

Code availability

The code used to produce these results is made available in a public repository50.


M.C. is supported by a PhD stipend from the Heinrich Böll Stiftung. J.C.M. acknowledges funding from the ERC-2020-SyG GENIE (grant ID 951542). S.N. and Q.L. acknowledge funding from the German Federal Ministry of Education and Research (BMBF) and the German Aerospace Center (DLR) via the LAMACLIMA project as part of AXIS, an ERANET initiated by JPI Climate (, last access: 26 August 2021, grant no. 01LS1905A), with co-funding from the European Union (grant no. 776608). M.R. acknowledges support by the ERC-SyG USMILE (grant ID 85518). R.J.B. acknowledges support from the EU Horizon2020 Marie-Curie Fellowship Program H2020-MSCA-IF-2018 (proposal no. 838667 -INTERACTION). We thank F. Zeng for providing preliminary temperature and precipitation trend assessment results for our project. We acknowledge the World Climate Research Programme, which, through its Working Group on Coupled Modelling, coordinated and promoted CMIP6. We thank the climate modelling groups for producing and making available their model output, the Earth System Grid Federation (ESGF) for archiving the data and providing access and the multiple funding agencies who support CMIP6 and ESGF.

M.C., J.C.M. and C.-F.S. designed the research. M.C. developed the coding platform and machine-learning pipeline to identify studies, with advice from M.R. M.C, C-F.S., G.H., Q.L. and E.T. developed the codebook and coordinated screening and coding. M.C., Q.L., S.N. and C-F.S. conceptualized the link to detection and attribution data. S.N. performed the univariate detection and attribution analysis of temperature and precipitation trends and assessment of internal variability, in consultation with T.R.K, who designed the methodology for these calculations. M.C. and S.N. designed and implemented the matching of studies with detection and attribution data. M.C., C-F.S., S.N., Q.L, G.H., E.T., M.A., R.J.B., M.H., C.J., K.L., A.L., N.v.M., I.M., P.P. and B.Y. contributed to screening and coding studies. M.C., C-F.S., J.C.M., Q.L. and S.N. wrote the manuscript with contributions from all authors.

Correspondence to Max Callaghan.

Extended Data Fig. 1 A visual representation of the workflow of our machine learning assisted attribution map.

Squares represent documents (not to scale), boxes represent the steps taken. Documents are screened by hand, and those labels are used to generate predictions and machine label documents. These machine-labelled documents are matched by location with information from observations and climate models on the detection and attribution of trends in temperature and precipitation.

Extended Data Fig. 2 Nested cross validation (CV) procedure for the binary relevance classifier.

Models are fit using training documents and evaluated on validation/test documents. The inner CV loop is used to search for optimal hyperparameter settings, which are then evaluated on the outer test sets.

Extended Data Fig. 3 Performance metrics for the binary inclusion/exclusion classifier.

Each pair of dots represents the scores for a distinct cross-validation fold. Horizontal lines show the mean score across folds.

Extended Data Fig. 4 Receiver operating curve area under the curve scores (ROC AUC) and F1 scores for the classification of impact categories.

Each pair of dots represents the scores for a distinct cross-validation fold. Horizontal lines show the mean score across folds.

Extended Data Fig. 5 Receiver operating curves area under the curve scores (ROC AUC)(ROC) and F1 scores for the classification of drivers.

Each pair of dots represents the scores for a distinct cross-validation fold. Horizontal lines show the mean score across folds.

Extended Data Fig. 6 Geographical distribution of surface trends.

Temperature from 1951 to 2018 (left) and precipitation trends from 1951 to 2016 (right) in (a),(b) observations and (c),(d) CMIP6 10-model ensemble mean all-forcing runs. Bottom panels (e),(f) show observations categorised into attribution categories, following refs. 8,7, respectively. Observed cooling/warming or drying/wetting trends that–after accounting for internal climate variability–are inconsistent with the simulated response to natural forcings but consistent with the simulated response to both natural and anthropogenic forcings are indicated by categories -/+2. This is clearest case of changes that are at least partially attributable to anthropogenic forcing, according to the CMIP6 ensemble. Categories -/+1 have detectable observed changes, but are not assessed as attributable to anthropogenic forcing because the observed changes are significantly less than those simulated in the average all-forcing runs. Categories -/+3 have detectable changes and are assessed as at least partly attributable anthropogenic forcing, although the observed changes are inconsistent with the all-forcing runs. That is, they are in the same direction as, but are significantly stronger than, the mean of the all-forcing runs. Categories -/+4 represents cooling/warming or drying/wetting trends that are inconsistent with the simulated response to natural forcings but whose sign is opposite to that of the average simulated all-forcing response; category 0 represents trends that are not distinguishable from natural variability alone. Categories -/+4 and 0 are considered to be examples of non-detectable trends).

Extended Data Fig. 7 Fractional difference between average CMIP6 modeled low-frequency standard deviation of annual mean precipitation vs observed precipitation.

To estimate the internal low-frequency variability for both models and observations, the observed time series were detrended and low-pass filtered with a 7-year running mean filter prior to computing the standard deviations while for the models we used the full available control runs (7-yr running mean filtered) to estimate the internal low-frequency variability for each model. The top panel shows the multi-model ensemble standard deviation comparison while the ten individual panels below it show the comparison for each individual CMIP6 model used in the study. The fraction difference was computed as: [(Model st. dev. - Observed st. dev.) / (Observed st. dev.)].

Extended Data Fig. 8 Difference between average CMIP6 modeled low-frequency standard deviation (°C) of annual mean surface air temperature vs observed surface temperature.

To estimate the internal low-frequency variability for both models and observations, the observed time series were detrended and low-pass filtered with a 7-year running mean filter prior to computing the standard deviations while for the models we used the full available control runs (7-year running mean filtered) to estimate the internal low-frequency variability for each model. The top panel shows the multi-model ensemble standard deviation comparison while the ten individual panels below it show the comparison for each individual CMIP6 model used in the study.

Extended Data Fig. 9 An illustration of the spatial resolution and weighting methodology.

Detection and attribution categories for temperature in East Africa; b. the number of grid cells of each type in Sudan; c. weighted studies for each grid cell in Sudan; d. The number of studies referring to each extracted geographical location in Sudan.

A list of the categories used to code relevant documents. Each category could be used as an impact or a driver. To make the classification problem tractable, the categories were merged into ‘broad categories’, resembling those used in IPCC AR5, and ‘aggregated categories’, which distinguish between different types of impacts to a greater extent.

Callaghan, M., Schleussner, CF., Nath, S. et al. Machine-learning-based evidence and attribution mapping of 100,000 climate impact studies. Nat. Clim. Chang. (2021).

