Jump to content
Main menu
Main menu
move to sidebar
hide
Navigation
Main page
Recent changes
Random page
Help about MediaWiki
Special pages
ClimateKG
Search
Search
English
Appearance
Create account
Log in
Personal tools
Create account
Log in
Pages for logged out editors
learn more
Contributions
Talk
Editing
IPCC:AR6/WGI/Chapter-3
(section)
IPCC
Discussion
English
Read
Edit source
View history
Tools
Tools
move to sidebar
hide
Actions
Read
Edit source
View history
General
What links here
Related changes
Page information
In other projects
Appearance
move to sidebar
hide
Warning:
You are not logged in. Your IP address will be publicly visible if you make any edits. If you
log in
or
create an account
, your edits will be attributed to your username, along with other benefits.
Anti-spam check. Do
not
fill this in!
==== 3.8.2.1 Integrative Measures of Model Performance ==== <div id="h3-26-siblings" class="h3-siblings"></div> The purpose of this section is to use multivariate analyses to address how well models simulate present-day and historical climate. For every diagnostic field considered, model performance is compared to one or multiple observational references, and the quality of the simulation is expressed as a single number, for example a correlation coefficient or a root mean square difference versus the observational reference. By simultaneously assessing different performance indices, model improvements can be quantified, similarities in behaviour between different models become apparent, and dependencies between various indices become evident ( [[#Gleckler--2008|Gleckler et al., 2008]] ; [[#Waugh--2008|Waugh and Eyring, 2008]] ). AR5 found significant differences between models in the simulation of mean climate in the CMIP5 ensemble when measured against meteorological reanalyses and observations ( [[#Flato--2013|Flato et al., 2013]] ), see also [[#Stouffer--2017|Stouffer et al. (2017)]] . The AR5 determined that for the diagnostic fields analysed, the models usually compared similarly against two different reference datasets, suggesting that model errors were generally larger than observational uncertainties or other differences between the observational references. In agreement with previous assessments, the CMIP5 multi-model mean generally performed better than individual models ( [[#Annan--2011|Annan and Hargreaves, 2011]] ; [[#Rougier--2016|Rougier, 2016]] ). The AR5 considered 13 atmospheric fields in its assessment for the instrumental period but did not assess multi-variate model performance in other climate domains (e.g., ocean, land, and sea ice). The AR5 found only modest improvement regarding the simulation of climate for two periods of the Earth’s history (the Last Glacial Maximum and the mid-Holocene) between CMIP5 and previous paleoclimate simulations. Similarly, for the modern period only modest, incremental progress was found between CMIP3 and CMIP5 regarding the simulation of precipitation and radiation. The representation of clouds also showed improvement, but remained a key challenge in climate modelling ( [[#Flato--2013|Flato et al., 2013]] ). The type of multi-variate analysis of models presented in AR5 remains critical to building confidence for example in projections of climate change. It is expanded here to the previous-generation CMIP3 and present-generation CMIP6 models and also to more variables and more climate domains, covering land and ocean as well as sea ice. The multi-variate evaluation of these three generations of models is performed relative to the observational datasets listed in Annex I, Table AI.1. For many of these datasets, a rigorous characterization of the observational uncertainty is not available, see discussion in Chapter 2. Here, as much as possible, multiple independent observational datasets are used. Disagreements among them would cause differences in model scoring, indicating that observational uncertainties may be substantial compared to model errors. Conversely, similar scores against different observational datasets would suggest model biases may be larger than the observational uncertainty. An analysis of a basket of 16 atmospheric variables (Figure 3.42a) assessed across CMIP3, CMIP5, and CMIP6 models but excluding high-resolution models participating in HighResMIP, reveals the progress made between these three generations of models ( [[#Bock--2020|Bock et al., 2020]] ). Progress is evidenced by the increasing prevalence of blue colours (indicating a performance better than the median) for the more recent model versions. Additionally, a few CMIP6 models outperform the best-performing CMIP5 models. Progress is evident across all 16 variables. As noted in AR5, the models typically score similarly against both observational reference datasets, indicating that indeed uncertainties in these reference datasets are smaller than model biases. Several models and model families perform better compared to observational references than the median, across a majority of the climate variables assessed, and conversely some other models or model families compare more poorly against these reference datasets. Such a good correspondence across a range of diagnostic fields probing different aspects of climate enhances confidence that the improved performances reflect progress in the physical realism of these simulations. An alternative explanation, that progress is due to a cancellation of errors achieved by model tuning, appears improbable given the large number of diagnostic fields involved here. However, several instances of poor model performance (red colours in Figure 3.42) still exist in the CMIP6 ensemble. Family relationships (i.e. various degrees of shared formulations; [[#Knutti--2013|Knutti et al., 2013]] ) between the models are apparent, for example, the GISS, GFDL, CESM, CNRM, and HadGEM/UKESM1/ACCESS families score similarly across all atmospheric variables, both for the CMIP5 and CMIP6 generations. In the cases of CESM2/CESM2(WACCM), CNRM-CM6-1/CNRM-ESM2-1, NorCPM1/NorESM2-LM, and HadGEM3-GC31-LL/UKESM1-0-LL, the high-complexity model scores as well or better than its lower-complexity counterpart, indicating that increasing complexity by adding Earth system features, which by removing constraints could be expected to degrade a model’s performance, does not necessarily do so. Several high climate-sensitivity models (Section 7.5; [[#Meehl--2020|Meehl et al., 2020]] ), in particular CanESM5, CESM2, CESM2-WACCM, HadGEM3-GC31-LL, and UKESM1-0-LL, score well against the benchmarks. In accordance with AR5 and earlier assessments, the multi-model mean, with some notable exceptions, is better than any individual model ( [[#Annan--2011|Annan and Hargreaves, 2011]] ; [[#Rougier--2016|Rougier, 2016]] ). <div id="_idContainer094" class="Basic-Text-Frame"></div> [[File:a39746d3d4c62039c839f67687ac547f IPCC_AR6_WGI_Figure_3_42.png]] Figure 3.42 | '''Relative space–time root-mean-square deviation (RMSD) calculated from the climatological seasonal cycle of the CMIP simulations (198''' '''0–''' '''1999) compared to observational datasets. (a)''' CMIP3, CMIP5, and CMIP6 for 16 atmospheric variables '''(b)''' CMIP5 and CMIP6 for 10 land variables and four ocean/sea-ice variables. A relative performance measure is displayed, with blue shading indicating better and red shading indicating worse performance than the median of all model results. A diagonal split of a grid square shows the relative error with respect to the reference data set (lower right triangle) and an additional data set (upper left triangle). Reference/additional datasets are from top to bottom in (a): ERA5/NCEP, GPCP-SG/GHCN, CERES-EBAF, CERES-EBAF, CERES-EBAF, CERES-EBAF, JRA-55/ERA5, ESACCI-SST/HadISST, ERA5/NCEP, ERA5/NCEP, ERA5/NCEP, ERA5/NCEP, ERA5/NCEP, ERA5/NCEP, AIRS/ERA5, ERA5/NCEP and in (b): CERES-EBAF, CERES-EBAF, CERES-EBAF, CERES-EBAF, LandFlux-EVAL, Landschuetzer2016/ JMA-TRANSCOM; MTE/FLUXCOM, LAI3g, JMA-TRANSCOM, ESACCI-SOILMOISTURE, HadISST/ATSR, HadISST, HadISST, ERA-Interim. White boxes are used when data are not available for a given model and variable. Figure is updated and expanded from [[#Bock--2020|Bock et al. (2020)]] , their Figure 5 CC BY 4.0 [https://dx.doi.org/10.1002/rog.20022 https://creativecommons.org/licenses/by/4.0/] . Further details on data sources and processing are available in the chapter data table (Table 3.SM.1). Regarding model performance for the ocean and the cryosphere (Figure 3.42b), it is apparent that for many models there are substantial differences between the scores for Arctic and Antarctic sea ice concentration. This might suggest that it is not sea ice physics directly that is driving such differences in performance but rather other influences, such as differences in geography, the role of large ice shelves (which are absent in the Arctic), or large-scale ocean dynamics. As for atmospheric variables, progress is evident also across all four ocean and ten land variables from CMIP5 to CMIP6. In summary, CMIP6 models perform generally better for a basket of variables covering mean historical climate across the atmosphere, ocean, and land domains than previous-generation and older models ( ''high confidence'' ). Earth System models characterized by additional biogeochemical feedbacks often perform at least as well as related more constrained, lower-complexity models lacking these feedbacks ( ''medium confidence'' ). In many cases, the models score similarly against both observational references, indicating that model errors are usually larger than observational uncertainties ( ''high confidence'' ). Moreover, synthesizing across Sections 3.3–3.7, we assess that the CMIP6 multi-model mean captures most aspects of observed climate change well ( ''high confidence'' ). Using centred pattern correlations (quantifying pattern similarity on a scale of –1 to 1, with 1 expressing perfect similarity and 0 no relationship) for selected fields, AR5 documented improvements between CMIP3 and CMIP5 in surface air temperature, outgoing longwave radiation, and precipitation (Figure 9.6 of [[#Flato--2013|Flato et al., 2013]] ). Little further progress between CMIP3 and CMIP5 was found for fields that were already quite well simulated in CMIP3 (such as surface air temperature and outgoing longwave radiation). For precipitation, the spread reduced because the worst-performing models improved. The shortwave cloud radiative effect remained relatively poorly simulated with significant inter-model spread (e.g., [[#Calisto--2014|Calisto et al., 2014]] ). This comparison of centred pattern correlations is designed to help determine the quality of simulation of different diagnostics relative to each other, and also to examine progress between generations of models. Figure 3.43 shows the centred pattern correlations for 16 variables for CMIP3, CMIP5 and CMIP6 models. In the ensemble averages, CMIP6 performs better than CMIP5 and CMIP3 for near-surface temperature, precipitation, mean sea-level pressure, and many other variables. For the variables shown, the uncertainties in observational datasets, in particular for precipitation and northward wind at 850 hPa, remain substantial relative to mean model errors (see grey dots in Figure 3.43). <div id="_idContainer096" class="•-2-columns"></div> [[File:010a3587f1a6573bfacb264de3bea8fd IPCC_AR6_WGI_Figure_3_43.png]] Figure 3.43 | '''Centred pattern correlations between models and observations for the annual mean climatology over the period 1980–1999.''' Results are shown for individual CMIP3 (green), CMIP5 (blue) and CMIP6 (red) models (one ensemble member from each model is used) as short lines, along with the corresponding multi-model ensemble averages (long lines). Correlations are shown between the models and the primary reference observational data set (from left to right: ERA5, GPCP-SG, CERES-EBAF, CERES-EBAF, CERES-EBAF, CERES-EBAF, JRA-55, ESACCI-SST, ERA5, ERA5, ERA5, ERA5, ERA5, ERA5, AIRS, ERA5). In addition, the correlation between the primary reference and additional observational datasets (from left to right: NCEP, GHCN, -, -, -, -, ERA5, HadISST, NCEP, NCEP, NCEP, NCEP, NCEP, NCEP, ERA5, NCEP) are shown (solid grey circles) if available. To ensure a fair comparison across a range of model resolutions, the pattern correlations are computed after regridding all datasets to a resolution of 4° in longitude and 5° in latitude. Figure is updated and expanded from [[#Bock--2020|Bock et al. (2020)]] , their Figure 7 CC BY 4.0 [https://dx.doi.org/10.1038/nature19082 https://creativecommons.org/licenses/by/4.0/] . Further details on data sources and processing are available in the chapter data table (Table 3.SM.1). In addition to the multivariate assessments of simulations of the recent historical period, simulations of selected periods of the Earth’s more distant history can be used to benchmark climate models by exposing them to climate forcings that are radically different from the present and recent past ( [[#Harrison--2015|Harrison et al., 2015]] , 2016; [[#Kageyama--2018|Kageyama et al., 2018]] ; [[#Tierney--2020a|Tierney et al., 2020a]] ). These time periods provide an out-of-sample test of models because they are not in general used in the process of model development. They encompass a range of climate drivers, such as volcanic and solar forcing for the Last Millennium, orbital forcing for the mid-Holocene and Last Interglacial, and changes in greenhouse gases and ice sheets for the LGM, mid-Pliocene Warm Period, and early Eocene (Sections 2.2 and 2.3). These drivers led to climate changes, including in surface temperature ( [[IPCC:Wg1:Chapter:Chapter-2#2.3.1.1|Section 2.3.1.1]] ) and the hydrological cycle ( [[IPCC:Wg1:Chapter:Chapter-2#2.3.1.3.1|Section 2.3.1.3.1]] ), which are described by paleoclimate proxies that have been synthesized to support evaluations of models on a global and regional scale. However, the more sparse, indirect, and regionally incomplete climate information available from paleo-archives motivates a different form of the multivariate analysis of simulations covering these periods versus the equivalent for the historical period, as described below. AR5 found that reconstructions and simulations of past climates both show similar responses in terms of large-scale patterns of climate change, such as polar amplification ( [[#Flato--2013|Flato et al., 2013]] ; [[#Masson-Delmotte--2013|Masson-Delmotte et al., 2013]] ). However, for several regional signals (e.g., the north–south temperature gradient in Europe and regional precipitation changes), the magnitude of change seen in the proxies relative to the pre-industrial period was underestimated by the models. When benchmarking CMIP5/PMIP3 models against reconstructions of the mid-Holocene and LGM, AR5 found only a slight improvement compared with earlier model versions across a range of variables. For the Last Interglacial, it was noted that the magnitude of observed annual mean warming in the Northern Hemisphere was only reached in summer in the models. For the mid-Pliocene Warm Period, it was noted that both proxies and models showed a polar amplification of temperature compared with the pre-industrial period, but a formal model evaluation was not carried out. Since AR5, new simulation protocols have been developed in PMIP4 ( [[#Kageyama--2018|Kageyama et al., 2018]] ), which are further described for the mid-Holocene and the Last Interglacial by [[#Otto-Bliesner--2017|Otto-Bliesner et al. (2017)]] , for the LGM by [[#Kageyama--2017|Kageyama et al. (2017)]] , for the Pliocene by [[#Haywood--2016|Haywood et al. (2016)]] , and for the early Eocene by [[#Lunt--2017|Lunt et al. (2017)]] . These have resulted in new model simulations for these time periods ( [[#Brierley--2020|Brierley et al., 2020]] ; [[#Haywood--2020|Haywood et al., 2020]] ; [[#Kageyama--2021a|Kageyama et al., 2021a]] ; [[#Lunt--2021|Lunt et al., 2021]] ; [[#Otto-Bliesner--2021|Otto-Bliesner et al., 2021]] ). These time periods span an assessed temperature range of 20°C ( [[IPCC:Wg1:Chapter:Chapter-2#2.3.1.1|Section 2.3.1.1]] ), and for all periods the PMIP4 multi-model ensemble mean is within 0.5°C of the assessed range of GSAT (Figure 3.44a). Those time periods for which the multi-model ensemble mean is outside the assessed range of GSAT, the mid-Holocene and the Last Interglacial, are primarily forced by orbital changes not greenhouse gas forcing, and as a result the forcing as well as the assessed and modelled response are relatively close to zero in the global annual mean. During these periods, climate change therefore is a consequence of more poorly understood Earth System feedbacks acting on the response to orbital differences versus the present-day, affecting the seasonality of insolation. <div id="_idContainer098" class="•-2-columns"></div> [[File:a8dd352c0ca568ef2392a9996863ccd2 IPCC_AR6_WGI_Figure_3_44.png]] Figure 3.44 | '''Multivariate synopsis of paleoclimate model results compared to observational references.''' Data-model comparisons for (a) GSAT anomalies for five PMIP4 periods and for regional features for the '''(b)''' mid-Holocene and '''(c)''' LGM periods, for PMIP3 and PMIP4 models. The results from CMIP6 models are shown as coloured dots. In (a) the light orange shading shows the ''very likely'' assessed ranges presented in [[IPCC:Wg1:Chapter:Chapter-2#2.3.1.1|Section 2.3.1.1]] . In (b) and (c), the regions and variables are defined as follows: North America (20°N–50°N, 140°W–60°W), Western Europe (35°N–70°N, 10°W–30°E) and West Africa (0°–30°N, 10°W–30°E); mean temperature of the coldest month (MTCO; °C), mean temperature of the warmest month (MTWA; °C), mean annual precipitation (MAP; mm yr <sup>–1</sup> ). In (b) and (c) the ranges shown for the reconstructions ( [[#Bartlein--2011|Bartlein et al. (2011)]] for mid-Holocene and [[#Cleator--2020|Cleator et al. (2020)]] for LGM) are based on the standard error given at each site: the average and associated standard deviation over each area is obtained by computing 1000 times the average of randomly drawn values from the Gaussian distributions defined at each site by the reconstruction mean and standard error; the light orange colour shows the ±1 standard deviation of these 1000 estimates. The dots on (b) and (c) show the average of the model output for grid points for which there are reconstructions. The ranges for the model results are based on an ensemble of 1000 averages over 50 years randomly picked in the model output time series for each region and each variable: the mean ± one standard deviation is plotted for each model. Figure is adapted from [[#Brierley--2020|Brierley et al. (2020)]] , their Figure S3 for the mid-Holocene; and from [[#Kageyama--2021b|Kageyama et al. (2021b)]] , their Figure 12 for the LGM. Further details on data sources and processing are available in the chapter data table (Table 3.SM.1). Polar amplification in the LGM, mid-Pliocene Warm Period, and Early Eocene Climatic Optimum (EECO) simulations is assessed in Section 7.4.4.1.2. Here we focus on the mid-Holocene and the LGM, which have been a part of AMIP or CMIP through several assessment cycles, and as such serve as a reference to quantify regional model-data agreement from one IPCC assessment to another. We compare the results from 15 CMIP6 models using the PMIP4 protocol (CMIP6-PMIP4), with non-CMIP6 models using the PMIP4 protocol, with PMIP3 models, and with regional temperature and precipitation changes from proxies for the mid-Holocene (Figure 3.44b). For six out of seven variables shown, the CMIP6 multi-model mean captures the correct sign of the change. For five out of seven of them the CMIP6 ensemble mean is within the reconstructed range. For the other two variables (changes in the mean temperature of the warmest month over North America and in the mean annual precipitation over West Africa) nearly all PMIP4 and PMIP3 models are outside the reconstructed range. CMIP6 models show regional patterns of temperature changes similar to the PMIP3 ensemble ( [[#Brierley--2020|Brierley et al., 2020]] ), but the slight mid-Holocene cooling in PMIP4 compared with PMIP3, probably associated with lower imposed mid-Holocene carbon dioxide concentrations ( [[#Otto-Bliesner--2017|Otto-Bliesner et al., 2017]] ), improves the regional model performance for summer and winter temperatures (Figure 3.44b). However, this cooling also results in a CMIP6 mid-Holocene GSAT that lies further from the assessed range (Figure 3.44a). All models show an expansion of the monsoon areas from the pre-industrial to the mid-Holocene simulations in the Northern Hemisphere, but this expansion in some cases is only large enough to cancel out the bias in the pre-industrial control simulations ( [[#3.3.3.2|Section 3.3.3.2]] ; [[#Brierley--2020|Brierley et al., 2020]] ). There is a slight improvement in representing the northward expansion of the West African monsoon region in PMIP4 compared with PMIP3 (Figures 3.11 and 3.44b). Fourteen simulations of the LGM climate have been produced following the CMIP6-PMIP4 protocol using 11 models, five of which are from the latest CMIP6 generation. The multi-model-mean global cooling simulated by these models is close to that simulated by the CMIP5-PMIP3 ensemble, but the range of results is larger. The increase in the range is largely due to the inclusion of CESM2 which simulates a much larger cooling than the other PMIP4 models (Figure 3.44a). This is consistent with its larger climate sensitivity (see also ( [[#3.3.1.1|Section 3.3.1.1]] ; [[#Zhu--2021|Zhu et al., 2021]] ). The other models on average also simulate slightly larger cooling in PMIP4 versus PMIP3 ( [[#Kageyama--2017|Kageyama et al., 2017]] , 2021a). The PMIP4 multi-model mean is within the range of reconstructed regional averages for four out of seven regional variables; this is unchanged from PMIP3 but for different variables (Figure 3.44c). For all fields, the results of many individual models are outside the reconstructed range. For two variables out of seven (changes in the mean temperature of the warmest month and mean annual precipitation over Western Europe) no model is within the range of the reconstructions. This analysis is strengthened compared with the equivalent analysis in AR5 because it is based on larger and improved reconstructions ( [[#Cleator--2020|Cleator et al., 2020]] ). Most CMIP6-PMIP4 models simulate a slightly stronger AMOC in the LGM, but no strong deepening of the AMOC ( [[#Kageyama--2021a|Kageyama et al., 2021a]] ), while most other PMIP4 models simulate a strengthening and strong deepening of the AMOC, as was the case for the PMIP3 models ( [[#Muglia--2015|Muglia and Schmittner, 2015]] ; [[#Sherriff-Tadano--2018|Sherriff-Tadano et al., 2018]] ). Only one model (CESM1.2) shows a shoaling of LGM AMOC which is consistent with reconstructions ( [[#Marzocchi--2017|Marzocchi and Jansen, 2017]] ; [[#Sherriff-Tadano--2018|Sherriff-Tadano et al., 2018]] ). 17 PMIP4 models completed Last Interglacial simulations ( [[#Otto-Bliesner--2021|Otto-Bliesner et al., 2021]] ). The comparison to reconstructions is generally good, except for some discrepancies, such as for upwelling systems in the South East Atlantic or discrepancies which may result from local melting of remnant ice sheets absent in the Last Interglacial simulation protocol. All models simulate a decrease in Arctic sea ice in summer, commensurate with increased summer insolation, while some models even simulate a large or complete loss ( [[#Guarino--2020|Guarino et al., 2020]] ; [[#Kageyama--2021b|Kageyama et al., 2021b]] ). Sea ice reconstructions for the central Arctic are, however, too uncertain to evaluate this behaviour. The Last Interglacial simulations indicate a clear relationship between simulated sea ice loss and model responses to increased greenhouse gas forcing ( [[#Kageyama--2021b|Kageyama et al., 2021b]] ; [[#Otto-Bliesner--2021|Otto-Bliesner et al., 2021]] ). Overall, the PMIP multi-model means agree very well (within 0.5°C of the assessed range) with GSAT reconstructed from proxies across multiple time periods, spanning a range from 6°C colder than pre-industrial (Last Glacial Maximum) to 14°C warmer than pre-industrial (Early Eocene Climate Optimum) ( ''high confidence'' ). During the orbitally-forced mid-Holocene, the CMIP6 multi-model mean captures the sign of the regional changes in temperature and precipitation in most regions assessed, and there have been some regional improvements compared to AR5 ( ''medium confidence'' ). The limited number of CMIP6 simulations of the LGM hinders model evaluation of the multi-model mean, but for both LGM and mid-Holocene, models tend to underestimate the magnitude of large changes ( ''high confidence'' ). Some long-standing model-data discrepancies, such as a dry bias in North Africa in the mid-Holocene, have not improved in CMIP6 compared with PMIP3 ( ''high confidence'' ). <div id="3.8.2.2" class="h3-container"></div> <span id="process-representation-in-different-classes-of-models"></span>
Summary:
Please note that all contributions to ClimateKG may be edited, altered, or removed by other contributors. If you do not want your writing to be edited mercilessly, then do not submit it here.
You are also promising us that you wrote this yourself, or copied it from a public domain or similar free resource (see
ClimateKG:Copyrights
for details).
Do not submit copyrighted work without permission!
Cancel
Editing help
(opens in new window)
Search
Search
Editing
IPCC:AR6/WGI/Chapter-3
(section)
Add languages
Add topic