Editing IPCC:AR6/WGI/Chapter-3 (section)

===== 3.3.1.1.1 Model evaluation =====

<div id="h4-1-siblings" class="h4-siblings"></div>

To be fit for detecting and attributing human influence on globally-averaged surface temperatures, climate models need to represent, based on physical principles, both the response of surface temperature to external forcings and the internal variability in surface temperature over various time scales. This section assesses the performance of those aspects in the latest generation CMIP6 climate models. See ([[#3.8|Section 3.8]] for evaluation at continental scales, [[IPCC:Wg1:Chapter:Chapter-10|Chapter 10]] for model evaluation in the context of regional climate information, and the [[IPCC:Wg1:Chapter:Atlas|Atlas]] for region-by-region assessments of model performance.

Reconstructions of past temperature from paleoclimate proxies ([[IPCC:Wg1:Chapter:Chapter-2#2.3.1.1|Section 2.3.1.1]] and Cross-Chapter Box 2.1) have been used to evaluate modelled past climate temperature change patterns. The AR5 found that CMIP5 ([[#Taylor--2012|Taylor et al., 2012]]) models were able to reproduce the large-scale patterns of temperature during the Last Glacial Maximum (LGM) ([[#Flato--2013|Flato et al., 2013]]) and simulated a polar amplification broadly consistent with reconstructions for warm (Pliocene and Eocene) and cold (LGM) periods ([[#Masson-Delmotte--2013|Masson-Delmotte et al., 2013]]). Since AR5, a better understanding of temperature proxies and their uncertainties and in some cases the forcing applied to model simulations has led to better agreement between models and reconstructions over a wide range of past climates. For the Pliocene and Eocene warm periods, understanding of uncertainties in temperature proxies ([[#Hollis--2019|Hollis et al., 2019]] ; [[#McClymont--2020|McClymont et al., 2020]]) and the boundary conditions used in climate simulations ([[#Haywood--2016|Haywood et al., 2016]] ; [[#Lunt--2017|Lunt et al., 2017]]) has improved, and some models now agree better with temperature proxies for these time periods compared to models assessed in AR5 (Sections 7.4.4.1.2, 7.4.4.2.2 and Cross-Chapter Box 2.4; [[#Zhu--2019|Zhu et al., 2019]] ; [[#Haywood--2020|Haywood et al., 2020]] ; [[#Lunt--2021|Lunt et al., 2021]]). For the Last Interglacial (LIG), improved temporal resolution of temperature proxies ([[#Capron--2017|Capron et al., 2017]]) and better appreciation of the importance of freshwater forcing ([[#Stone--2016|Stone et al., 2016]]) have clarified the reasons behind apparent model-data inconsistencies. Regional LIG temperature responses simulated by CMIP6 are within the uncertainty ranges of reconstructed temperature responses, except in regions where unresolved changes in regional ocean circulation, meltwater, or vegetation changes may cause model mismatches ([[#Otto-Bliesner--2021|Otto-Bliesner et al., 2021]]). For the LGM, the CMIP5 and CMIP6 ensembles compare similarly to new sea surface temperature (SST) and surface air temperature (SAT) proxy reconstructions (Figure 3.2a; [[#Cleator--2020|Cleator et al., 2020]] ; [[#Tierney--2020b|Tierney et al., 2020b]]). The very cold CMIP6 LGM simulation by the Community Earth System Model Version 2.1 (CESM2.1) is an exception related to the high equilibrium climate sensitivity (ECS) of that model (Section 7.5.6; [[#Kageyama--2021a|Kageyama et al., 2021a]] ; [[#Zhu--2021|Zhu et al., 2021]]). Figure 3.2a illustrates the wide range of simulated global LGM temperature responses in both ensembles. CMIP6 models tend to underestimate the cooling over land, but agree better with oceanic reconstructions. For the mid-Holocene, the regional biases found in CMIP5 simulations are similar to those in pre-industrial and historical simulations ([[#Harrison--2015|Harrison et al., 2015]] ; [[#Ackerley--2017|Ackerley et al., 2017]]), suggesting common causes. CMIP5 models underestimate Arctic warming in the mid-Holocene ([[#Yoshimori--2019|Yoshimori and Suzuki, 2019]]). CMIP6 models simulate a mid-latitude, subtropical, and tropical cooling compared to the pre-industrial period, whereas temperature proxies indicate a warming (see [[IPCC:Wg1:Chapter:Chapter-2#2.3.1.1.2|Section 2.3.1.1.2]] ; [[#Brierley--2020|Brierley et al., 2020]] ; [[#Kaufman--2020|Kaufman et al., 2020]]), although accounting for seasonal effects in the proxies may reduce the discrepancy ([[#Bova--2021|Bova et al., 2021]]). Over the past millennium, reconstructed and simulated temperature anomalies, internal variability, and forced response agree well over Northern Hemisphere continents, but those statistics disagree strongly in the Southern Hemisphere, where models seem to overestimate the response ([[#PAGES%202k-PMIP3%20group--2015|PAGES 2k-PMIP3 group, 2015]]). That disagreement is partly explained by the lower quality of the reconstructions in the Southern Hemisphere, but model and/or forcing errors may also contribute ([[#Neukom--2018|Neukom et al., 2018]]). Figure 3.2b shows that land/sea warming contrast behaves coherently in model simulations across multiple periods, with a slight non-linearity in land warming due to a smaller contribution of snow cover to temperature response in warmer climates. A multivariate assessment of paleoclimate model simulations is carried out in [[#3.8.2|Section 3.8.2]] .

<div id="_idContainer009" class="_idGenObjectStyleOverride-1"></div>

[[File:1ce2530ae198a3957bfa3dca50d72667 IPCC_AR6_WGI_Figure_3_2.png]]

Figure 3.2 | Changes in surface temperature for different paleoclimates. '''(a)''' Comparison of reconstructed and modelled surface temperature anomalies for the Last Glacial Maximum over land and ocean in the Tropics (30°N–30°S). Land-based reconstructions are from [[#Cleator--2020|Cleator et al. (2020)]] . Ocean-based reconstructions are from [[#Tierney--2020b|Tierney et al. (2020b)]] . Model anomalies are calculated as the difference between Last Glacial Maximum and pre-industrial control simulations of the PMIP3 and PMIP4 ensembles, sampled at the reconstruction data points. '''(b)''' Land–sea contrast in global mean surface temperature change for different paleoclimates. Small symbols show individual model simulations from the CMIP5 and CMIP6 ensembles. Large symbols show ensemble means and assessed values. '''(c)''' Upper panel shows time series of volcanic radiative forcing, in W m <sup>−2</sup> , as used in the CMIP5 ([[#Gao--2008|Gao et al., 2008]] ; [[#Crowley--2013|Crowley and Unterman, 2013]] ; see also [[#Schmidt--2011|Schmidt et al., 2011]]) and CMIP6 (850 CE to 1900 CE from [[#Toohey--2017|Toohey and Sigl (2017)]] , 1850–2015 from [[#Luo--2018|Luo (2018)]]). The forcing was calculated from the stratospheric aerosol optical depth at 550 nm shown in Figure 2.2. Lower panel shows time series of global mean surface temperature anomalies, in °C, with respect to 1850–1900 for the CMIP5 and CMIP6 past1000 simulations and their historical continuation simulations. Simulations are coloured according to the volcanic radiative forcing dataset they used. The median reconstruction of temperature from [[#PAGES%202k%20Consortium--2019|PAGES 2k Consortium (2019)]] is shown in black, the 5–95% confidence interval is shown by grey lines and the grey envelopes show the 1st, 5th, 15th, 25th, 35th, 45th, 55th, 65th, 75th, 85th, 95th, and 99th percentiles. All data in both panels are band-passed filtered, where frequencies longer than 20 years have been retained. Further details on data sources and processing are available in the chapter data table (Table 3.SM.1).

For the historical period, AR5 assessed with ''very high confidence'' that CMIP5 models reproduced observed large-scale mean surface temperature patterns, although errors of several degrees appear in elevated regions, like the Himalayas and Antarctica, near the edge of the sea ice in the North Atlantic, and in upwelling regions. This assessment is updated here for the CMIP6 simulations. Figure 3.3 shows the annual mean surface air temperature at 2 m for the CMIP5 and CMIP6 multi-model means, both compared to the fifth generation European Centre for Medium-Range Weather Forecasts (ECMWF) atmospheric reanalysis (ERA5; [[IPCC:Wg1:Chapter:Chapter-1#1.5.2|Section 1.5.2]]) for the period 1995–2014. The distribution of biases is similar in CMIP5 and CMIP6 models, as already noted by several studies ([[#Crueger--2018|Crueger et al., 2018]] ; [[#Găinuşă-Bogdan--2018|Găinuşă-Bogdan et al., 2018]] ; [[#Kuhlbrodt--2018|Kuhlbrodt et al., 2018]] ; [[#Lauer--2018|Lauer et al., 2018]]). Arctic temperature biases seem more widespread in both ensembles than assessed at the time of AR5. The fundamental causes of temperature biases remain elusive, with errors in clouds ([[#Lauer--2018|Lauer et al., 2018]]), ocean circulation ([[#Kuhlbrodt--2018|Kuhlbrodt et al., 2018]]), winds ([[#Lauer--2018|Lauer et al., 2018]]), and surface energy budget ([[#Hourdin--2015|Hourdin et al., 2015]] ; [[#Séférian--2016|Séférian et al., 2016]] ; [[#Găinuşă-Bogdan--2018|Găinuşă-Bogdan et al., 2018]]) being frequently cited candidates. Increasing horizontal resolution shows promise for decreasing long-standing biases in surface temperature over large regions ([[#Bock--2020|Bock et al., 2020]]). Panels e and f of Figure 3.3 show that biases in the mean High-Resolution Model Intercomparison Project (HighResMIP, [[#Haarsma--2016|Haarsma et al., 2016]]) models (see also Table AII.6) are smaller than those in the mean of the corresponding lower-resolution versions of the same models simulating the same period (see also ([[#3.8.2.2|Section 3.8.2.2]]). However, the bias reduction is modest ([[#Palmer--2019|Palmer and Stevens, 2019]]). In addition, the biases of the limited number of models participating in HighResMIP are not entirely representative of overall CMIP6 biases, especially in the Southern Ocean, as indicated by comparing panels b and f of Figure 3.3.

<div id="_idContainer012" class="Basic-Text-Frame"></div>

[[File:9ae9d81ddeca5734e2e7187cf097a96b IPCC_AR6_WGI_Figure_3_3.png]]

Figure 3.3 '''|''' '''Annual mean near-surface (2 m) air temperature (°C) for the period 1995–2014. (a)''' Multi-model (ensemble) mean constructed with one realization of the CMIP6 historical experiment from each model. '''(b)''' Multi-model mean bias, defined as the difference between the CMIP6 multi-model mean and the climatology of the fifth generation European Centre for Medium-Range Weather Forecasts (ECMWF) atmospheric reanalysis of the global climate (ERA5). '''(c)''' Multi-model mean of the root mean square error calculated over all months separately and averaged, with respect to the climatology from ERA5. '''(d)''' Multi-model mean bias defined as the difference between the CMIP6 multi-model mean and the climatology from ERA5. The difference between the multi-model mean of '''(e)''' high-resolution and '''(f)''' low-resolution simulations of four HighResMIP models and the climatology from ERA5 is also shown. Uncertainty is represented using the advanced approach: No overlay indicates regions with robust signal, where ≥66% of models show change greater than the variability threshold and ≥80% of all models agree on sign of change; diagonal lines indicate regions with no change or no robust signal, where &lt;66% of models show a change greater than the variability threshold; crossed lines indicate regions with conflicting signal, where ≥66% of models show change greater than the variability threshold and &lt;80% of all models agree on sign of change. For more information on the advanced approach, please refer to Cross-Chapter Box Atlas.1. Dots in panel (e) mark areas where the bias in high resolution versions of the HighResMIP models is not lower in at least three out of four models than in the corresponding low-resolution versions. Further details on data sources and processing are available in the chapter data table (Table 3.SM.1).

The AR5 assessed with ''very high confidence'' that models reproduce the general history of the increase in global-scale annual mean surface temperature since the year 1850, although AR5 also reported that an observed reduction in the rate of warming over the period 1998–2012 was not reproduced by the models (Cross-Chapter Box 3.1; [[#Flato--2013|Flato et al., 2013]]). Figure 3.2c and Figure 3.4 show time series of anomalies in annually and globally averaged surface temperature simulated by CMIP5 and CMIP6 models for the past millennium and the period 1850 to 2020, respectively, with the baseline set to 1850–1900 (see [[IPCC:Wg1:Chapter:Chapter-1#1.4.1|Section 1.4.1]]). As also indicated by Figure 3.4, the spread in simulated absolute temperatures is large ([[#Palmer--2019|Palmer and Stevens, 2019]]). However, the discussion is based on temperature anomaly time series instead of absolute temperatures because our focus is on evaluation of the simulation of climate change in these models, and also because anomalies are more uniformly distributed and are more easily deseasonalized to isolate long-term trends (see [[IPCC:Wg1:Chapter:Chapter-1#1.4.1|Section 1.4.1]]). CMIP6 models broadly reproduce surface temperature variations over the past millennium, including the cooling that follows periods of intense volcanism (''medium confidence'') (Figure 3.2c). Simulated GMST anomalies are well within the uncertainty range of temperature reconstructions (''medium confidence'') since about the year 1300, except for some short periods immediately following large volcanic eruptions, for which simulations driven by different forcing datasets disagree (Figure 3.2c). Before the year 1300, larger disagreements between models and temperature reconstructions are expected because forcing and temperature reconstructions are increasingly uncertain further back in time, but specific causes have not been identified conclusively ([[#Ljungqvist--2019|Ljungqvist et al., 2019]] ; [[#PAGES%202k%20Consortium--2019|PAGES 2k Consortium, 2019]]) (''medium confidence''). For the historical period, results for CMIP6 shown in Figure 3.4 suggest that the qualitative history of surface temperature increase is well reproduced, including the increase in warming rates beginning in the 1960s and the temporary cooling that follows large volcanic eruptions.

<div id="_idContainer014" class="Basic-Text-Frame"></div>

[[File:72fcbc98740a23baf81355209bdf9ca3 IPCC_AR6_WGI_Figure_3_4.png]]

Figure 3.4 | '''Observed and simulated time series of the anomalies in annual and global mean surface air temperature (GSAT).''' All anomalies are differences from the 1850–1900 time-mean of each individual time series. The reference period 1850–1900 is indicated by grey shading. '''(a)''' Single simulations from CMIP6 models (thin lines) and the multi-model mean (thick red line). Observational data (thick black lines) are from the Met Office Hadley Centre/Climatic Research Unit dataset (HadCRUT5), and are blended surface temperature (2 m air temperature over land and sea surface temperature over the ocean). All models have been subsampled using the HadCRUT5 observational data mask. Vertical lines indicate large historical volcanic eruptions. CMIP6 models which are marked with an asterisk are either tuned to reproduce observed warming directly, or indirectly by tuning equilibrium climate sensitivity. Inset: GSAT for each model over the reference period, not masked to any observations. '''(b)''' Multi-model means of CMIP5 (blue line) and CMIP6 (red line) ensembles and associated 5th to 95th percentile ranges (shaded regions). Observational data are HadCRUT5, Berkeley Earth, National Oceanic and Atmospheric Administration NOAAGlobalTemp-Interim and [[#Kadow--2020|Kadow et al. (2020)]] . Masking was done as in (a). CMIP6 historical simulations were extended with SSP2-4.5 simulations for the period 2015–2020 and CMIP5 simulations were extended with RCP4.5 simulations for the period 2006–2020. All available ensemble members were used (see [[#3.2|Section 3.2]]). The multi-model means and percentiles were calculated solely from simulations available for the whole time span (1850–2020). Figure is updated from [[#Bock--2020|Bock et al. (2020)]] , their Figures 1 and 2. CC BY 4.0 [https://unfccc.int/resource/docs/2017/cop23/eng/l13.pdf https://creativecommons.org/licenses/by/4.0/] . Further details on data sources and processing are available in the chapter data table (Table 3.SM.1).

Although virtually all CMIP6 modelling groups report improvements in their model’s ability to simulate current climate compared to the CMIP5 version ([[#Gettelman--2019|Gettelman et al., 2019]] ; [[#Golaz--2019|Golaz et al., 2019]] ; [[#Mauritsen--2019|Mauritsen et al., 2019]] ; [[#Swart--2019|Swart et al., 2019]] ; [[#Voldoire--2019b|Voldoire et al., 2019b]] ; T. [[#Wu--2019|Wu et al., 2019]] b; [[#Bock--2020|Bock et al., 2020]] ; [[#Boucher--2020|Boucher et al., 2020]] ; [[#Dunne--2020|Dunne et al., 2020]]), it does not necessarily follow that the simulation of temperature trends is also improved ([[#Bock--2020|Bock et al., 2020]] ; [[#Fasullo--2020|Fasullo et al., 2020]]). The CMIP6 multi-model ensemble encompasses observed warming and the multi-model mean tracks those observations within 0.2°C over most of the historical period. Figure 3.4 confirms the findings of [[#Papalexiou--2020|Papalexiou et al. (2020)]] , who highlighted based on 29 CMIP6 models that most models replicate the period of slow warming between 1942 and 1975 and the late twentieth century warming (1975–2014). The CMIP6 multi-model mean is cooler over the period 1980–2000 than both observations and CMIP5 (Figure 3.4; [[#Bock--2020|Bock et al., 2020]] ; [[#Flynn--2020|Flynn and Mauritsen, 2020]] ; [[#Gillett--2021|Gillett et al., 2021]]). Biases of several tenths of a degree in some CMIP6 models over that period may be due to an overestimate in aerosol radiative forcing (Sections 6.3.5 and 7.3.3, and Figure 6.8; [[#Andrews--2020|Andrews et al., 2020]] ; [[#Dittus--2020|Dittus et al., 2020]] ; [[#Flynn--2020|Flynn and Mauritsen, 2020]]). [[#Papalexiou--2020|Papalexiou et al. (2020)]] , [[#Tokarska--2020|Tokarska et al. (2020)]] and [[#Stolpe--2021|Stolpe et al. (2021)]] all report that CMIP6 models on average overestimate warming from the 1970s or 1980s to the 2010s, although quantitative conclusions depend on which observational dataset is compared against (see also Table 2.4). However, Figure 3.4, which includes a larger number of models than available to those studies, indicates that the CMIP6 multi-model mean tracks observed warming better than the CMIP5 multi-model mean after the year 2000. The CMIP6 multi-model mean GSAT warming between 1850–1900 and 2010–2019 and associated 5–95% range is 1.09 [0.66 to 1.64] °C. Cross-Chapter Box 2.3 assessed GSAT warming over the same period at 1.06 [0.88 to 1.21] °C. So some CMIP6 models simulate a warming that is smaller than the assessed observed range, and other CMIP6 models simulate a warming that is larger. That overestimated warming may be an early symptom of overestimated ECS in some CMIP6 models (Section 7.5.6; [[#Meehl--2020|Meehl et al., 2020]] ; [[#Schlund--2020|Schlund et al., 2020]]), and has implications for projections of GSAT changes (Chapter 4; [[#Liang--2020|Liang et al., 2020]] ; [[#Nijsse--2020|Nijsse et al., 2020]] ; [[#Tokarska--2020|Tokarska et al., 2020]] ; [[#Ribes--2021|Ribes et al., 2021]]). In some models, a large ECS and a strong aerosol forcing lead to too large a mid-20th century cooling followed by overestimated warming rates in the late 20th century when aerosol emissions decrease ([[#Golaz--2019|Golaz et al., 2019]] ; [[#Flynn--2020|Flynn and Mauritsen, 2020]]). Temperature biases are driven by both model physics and prescribed forcing, which is a challenge for model development.

[[#Chylek--2020|Chylek et al. (2020)]] argue that CMIP5 models overestimate the temperature response to volcanic eruptions. [[#Lehner--2016|Lehner et al. (2016)]] , [[#Rypdal--2018|Rypdal (2018)]] and [[#Stolpe--2021|Stolpe et al. (2021)]] point instead to missed compensating effects on surface temperature change associated with internal variability in the El Niño–Southern Oscillation (ENSO) or the Atlantic Multi-decadal Oscillation (AMO). An alternative view sees those ENSO and AMO responses as expressions of changes in climate feedbacks driven by the geographical pattern of SST changes ([[#Andrews--2018|Andrews et al., 2018]]). At least one model is able to reproduce such pattern effects ([[#Gregory--2016|Gregory and Andrews, 2016]]). Errors in the volcanic forcing prescribed in simulations, including for CMIP6 ([[#Rieger--2020|Rieger et al., 2020]]), also introduce differences with the observed temperature response, independently of the quality of the model physics. In addition, comparisons of the modelled temperature response to large eruptions over the past millennium to temperature reconstructions based on tree rings show a much better agreement ([[#Lücke--2019|Lücke et al., 2019]] ; F. [[#Zhu--2020|]] [[#Zhu--2020|Zhu et al., 2020]]) than comparisons to the annual, multi-temperature proxy reconstructions shown in Figure 3.2c. These considerations, and Figures 3.2c and 3.4, suggest that CMIP6 models do not systematically overestimate the cooling that follows large volcanic eruptions (see also Cross-Chapter Box 4.1).

When interpreting model simulations of historical temperature change, it is important to keep in mind that some models are tuned towards representing the observed trend in global mean surface temperature over the historical period ([[#Hourdin--2017|Hourdin et al., 2017]]). In Figure 3.4 the CMIP6 models that are documented to have been tuned to reproduce observed warming, typically by tuning aerosol forcing or factors that influence the model’s ECS, are marked with an asterisk. Such tuning of a model can strongly impact its temperature projections ([[#Mauritsen--2020|Mauritsen and Roeckner, 2020]]). However, [[#Bock--2020|Bock et al. (2020)]] reported that there is no statistically significant difference in multi-model mean GSAT between the models that had been tuned based on observed warming compared to those which had not. Moreover, only two of thirteen models used for the Detection and Attribution Model Intercomparison Project (DAMIP) simulations on which CMIP6 attribution studies are based were tuned towards historical warming ([[#Bock--2020|Bock et al., 2020]] ; [[#Gillett--2021|Gillett et al., 2021]]). Further, tuning is done on globally averaged quantities, so does not substantially change the spatio-temporal pattern of response on which many regression-based attribution studies are based ([[#Bock--2020|Bock et al., 2020]]). Therefore, we assess with ''high confidence'' that the tuning of a small number of CMIP6 models to observed warming has not substantially influenced attribution results assessed in this chapter.

The reliance of detection and attribution studies on climate models (see [[#3.2|Section 3.2]]) requires that those models simulate realistic statistics of internal variability on multi-decadal time scales. An incorrect estimate of variability in models would affect confidence in the conclusions from detection and attribution. The AR5 found that CMIP5 models simulate realistic variability in global-mean surface temperature on decadal time scales, with variability on multi-decadal time scales being more difficult to evaluate because of the short observational record ([[#Flato--2013|Flato et al., 2013]]). Since AR5, new work has characterized the contributions of variability in different ocean areas to SST variability, with tropical modes of variability like ENSO dominant on time scales of five to ten years, while longer time scales see the variance maxima move poleward to the North Atlantic, North Pacific, and Southern oceans ([[#Monselesan--2015|Monselesan et al., 2015]]). There may, however, be sizeable, two-way interdependencies between ENSO and sea surface temperature variability in different basins ([[#Kumar--2014|Kumar et al., 2014]] ; [[#Cai--2019|Cai et al., 2019]]), and ENSO’s influence on global surface temperature variability may not be confined only to decadal time scales ([[#Triacca--2014|Triacca et al., 2014]]). Studies based on large ensembles of 20th and 21st century climate change simulations confirm that internal variability has a substantial influence on global warming trends over periods shorter than 30–40 years ([[#Kay--2015|Kay et al., 2015]] ; [[#Dai--2019|Dai and Bloecker, 2019]]). Although the equatorial Pacific seems to be the main source of internal variability on decadal time scales, [[#Brown--2016a|Brown et al. (2016a)]] linked diversity in modelled oceanic convection, sea ice, and energy budget in high-latitude regions to overall diversity in modelled internal variability.

Interest in internal variability since the publication of AR5 stems in part from its importance in understanding the slower global surface temperature warming over the early 21st century (see Cross-Chapter Box 3.1). Evidence coming mostly from paleo studies is mixed on whether CMIP5 models underestimate decadal and multi-decadal variability in global mean temperature. [[#Schurer--2013|Schurer et al. (2013)]] found good agreement between internal variability derived from paleo reconstructions, estimated as the fraction of variance that is not explained by forced responses, and modelled variability, although the subset of CMIP5 models they used may have been associated with larger variability than the full CMIP5 ensemble. [[#PAGES%202k%20Consortium--2019|PAGES 2k Consortium (2019)]] found that the largest 51-year trends in both reconstructions of global mean temperature and fully forced climate simulations over the period 850 to 1850 were almost identical. [[#Zhu--2019|Zhu et al. (2019)]] showed agreement in the modelled and reconstructed temporal spectrum of global surface temperatures on annual to multi-millennial time scales. However, they suggest that decadal- to centennial variability is partly forced by slow orbital changes that predate the last millennium. This is consistent with [[#Gebbie--2019|Gebbie and Huybers (2019)]] , who showed that the deep ocean has been out of equilibrium over that period. [[#Laepple--2014|Laepple and Huybers (2014)]] found good agreement between modelled and proxy-derived decadal ocean temperature variability, but underestimates of variance by models by at least a factor of ten at centennial time scales because models underestimate the difference between the warm and cold periods of the last millennium. [[#Parsons--2020|Parsons et al. (2020)]] found that some CMIP6 models exhibit much higher multi-decadal variability in GSAT than CMIP5 models, with indications that variability in these models is also higher than that from proxy reconstructions. CMIP6 models may not share the underestimation by CMIP5 models of variability in decadal to multi-decadal modes of variability, such as Pacific Decadal Variability ([[#3.7.6|Section 3.7.6]] ; [[#England--2014|England et al., 2014]] ; [[#Thompson--2014|Thompson et al., 2014]] ; [[#Schurer--2015|Schurer et al., 2015]]) and Atlantic Multi-decadal Variability (AMV), which may be partly forced, (see [[#3.7.7|Section 3.7.7]]) but this assessment is limited by the small number of available studies. For the Southern Hemisphere, [[#Hegerl--2018|Hegerl et al. (2018)]] found an instance of internal variability in the early 20th century larger than that modelled, but indicated that could be an observational issue. [[#Friedman--2020|Friedman et al. (2020)]] found biases in interhemispheric SST contrast in some models that may be consistent with underestimated cooling after early-20th century eruptions or underestimated Pacific Decadal Variability, but could also be due to an imperfect separation between internal variability and forced signal in the observations. Figure 3.2c, updated from [[#PAGES%202k%20Consortium--2019|PAGES 2k Consortium (2019)]] , compares modelled temperatures to reconstructions over the last millennium. It indicates that models reproduce the observed variability well, at least for the time scales between 20 and 50 years that paleo reconstructions typically resolve and that the figure represents. In summary, decadal GMST variability simulated in CMIP6 models spans the range of residual decadal variability in large-scale reconstructions (''medium evidence'' , ''low agreement'').

In addition, new literature suggests that anthropogenic forcing itself may locally increase or decrease variability in surface temperatures ([[#Screen--2014|Screen et al., 2014]] ; [[#Qian--2015|Qian and Zhang, 2015]] ; [[#Brown--2017|Brown et al., 2017]] ; [[#Park--2018|Park et al., 2018]] ; [[#Santer--2018|Santer et al., 2018]] ; [[#Weller--2020|Weller et al., 2020]]). These studies imply limitations in the use of pre-industrial control simulations to quantify the role of unforced variability over the historical period. Some recent attribution studies ([[#Gillett--2021|Gillett et al., 2021]] ; [[#Ribes--2021|Ribes et al., 2021]]) have estimated variability from ensembles of forced simulations instead, which would be expected to resolve any such changes in variability.

Figure 3.5 shows the standard deviation of zonal-mean surface temperature in CMIP6 pre-industrial control simulations and observed temperature datasets. Results are consistent with those based on CMIP5 models, which showed the largest model spread where variability is also large, in the tropics and mid- to high latitudes ([[#Flato--2013|Flato et al., 2013]]). Modelled variability is within a factor two of observed variability over most of the globe. The apparent overestimation of high latitude variability in models compared to observations may be due to interpolation and infilling over data sparse high latitude regions in the observational products shown here ([[#Jones--2016|Jones, 2016]]).

<div id="_idContainer016" class="Basic-Text-Frame"></div>

[[File:89c130003ac057ad307bb43aaf0ac219 IPCC_AR6_WGI_Figure_3_5.png]]

Figure 3.5 | '''The standard deviation of annually averaged zonal-mean near-surface air temperature.''' This is shown for four detrended observed temperature datasets (HadCRUT5, Berkeley Earth, NOAAGlobalTemp-Interim and [[#Kadow--2020|Kadow et al. (2020)]] , for the years 1995-2014) and 59 CMIP6 pre-industrial control simulations (one ensemble member per model, 65 years) (after [[#Jones--2013|Jones et al., 2013]]). For line colours see the legend of Figure 3.4. Additionally, the multi-model mean (red) and standard deviation (grey shading) are shown. Observational and model datasets were detrended by removing the least-squares quadratic trend. Further details on data sources and processing are available in the chapter data table (Table 3.SM.1).

The previous paragraph took an ensemble-mean view of model performance, but individual models disagree on unforced variability. Figure 3.6 illustrates the large differences in GSAT variability in unforced CMIP6 pre-industrial control simulations, following the method of [[#Parsons--2020|Parsons et al. (2020)]] . Surface temperatures in pre-industrial conditions are especially variable in the ten models highlighted in Figure 3.6a, and some models substantially exceed the variability seen in CMIP5 models ([[#Parsons--2020|Parsons et al., 2020]]). Figure 3.6b shows that the distribution of warming trends simulated by CMIP6 models in historical simulations is clearly distinct from that simulated in unforced pre-industrial control simulations. Still, the unforced variability of the five most variable models approaches half that observed over the historical period under anthropogenically forced conditions (Figure 3.6c; [[#Parsons--2020|Parsons et al., 2020]] ; [[#Ribes--2021|Ribes et al., 2021]]). For the Centre National de la Recherche Météorologique (CNRM) models, which are among the most variable, the large, low-frequency variability is attributed to strong simulated Atlantic Multi-decadal Variability ([[#Séférian--2019|Séférian et al., 2019]] ; [[#Voldoire--2019b|Voldoire et al., 2019b]]), which is difficult to rule out because of the short observational record ([[#3.7.7|Section 3.7.7]] ; [[#Cassou--2018|Cassou et al., 2018]]). But, importantly, patterns of temperature variability simulated by even the most variable models differ from the pattern of forced temperature change ([[#Parsons--2020|Parsons et al., 2020]]). Taken together, this discussion and Figures 3.2, 3.5 and 3.6 indicate that the statistics of internal variability in models compare well in most cases to observational estimates and temperature proxy reconstructions, though some CMIP6 models appear to have higher multi-decadal variability than CMIP5 models or proxy reconstructions. When used in attribution studies, models with overestimated variability would increase estimated uncertainties and make results statistically conservative.

<div id="_idContainer018" class="Basic-Text-Frame"></div>

[[File:ff55ffd4ceaf6c8cdb4215341f9645a0 IPCC_AR6_WGI_Figure_3_6.png]]

'''Figure 3.6 | Simulated internal variability of global surface air temperature (GSAT) versus observed changes. (a)''' Time series of five-year running mean GSAT anomalies in 45 CMIP6 pre-industrial control (unforced) simulations. The 10 most variable models in terms of five-year running mean GSAT are coloured according to the legend on Figure 3.4. '''(b)''' Histograms of GSAT changes in CMIP6 historical simulations (extended by using SSP2-4.5 simulations) from 1850–1900 to 2010–2019 are shown by pink shading in (c), and GSAT changes between the average of the first 51 years and the average of the last 20 years of 170-year overlapping segments of the pre-industrial control simulations shown in (a) are shown by blue shading. GMST changes in observational datasets for the same period are indicated by black vertical lines. '''(c)''' Observed GMST anomaly time series relative to the 1850–1900 average. Black lines represent the five-year running means while grey lines show unfiltered annual time series. Further details on data sources and processing are available in the chapter data table (Table 3.SM.1).

In summary, there is ''high confidence'' that CMIP6 models reproduce observed large-scale mean surface temperature patterns and internal variability as well as their CMIP5 predecessors, but with little evidence for reduced biases. CMIP6 models also reproduce historical GSAT changes similarly to their CMIP5 counterparts (''medium confidence''). However, in spite of model imperfections, there is ''very high confidence'' that biases in surface temperature trends and variability simulated by the CMIP5 and CMIP6 ensembles are small enough to support detection and attribution of human-induced warming.

<div id="3.3.1.1.2" class="h4-container"></div>

<span id="detection-and-attribution"></span>