Editing IPCC:Wg1:Chapter:Chapter-1-comments (section)

=== 1.5.4 Modelling Techniques, Comparisons and Performance Assessments ===

<div id="h2-31-siblings" class="h2-siblings"></div>

Numerical models, however complex, cannot be a perfect representation of the real world. Results from climate modelling simulations constitute a key line of evidence for the present Report, which requires considering the limitations of each model simulation. This section presents recent developments in techniques and approaches to robustly extract, quantify and compare results from multiple, independent climate models, and how their performance can be assessed and validated.

<div id="1.5.4.1" class="h3-container"></div>

<span id="model-fitness-for-purpose"></span>
==== 1.5.4.1 Model ‘Fitness-for-Purpose’ ====

<div id="h3-33-siblings" class="h3-siblings"></div>

A key issue addressed in this Report is whether climate models are adequate or ‘fit’ for purposes of interest, that is, whether they can be used to successfully answer particular research questions, especially about the causes of recent climate change and the future evolution of climate (e.g., [[#Parker--2009|Parker, 2009]] ; [[#Notz--2015|Notz, 2015]] ; [[#Knutti--2018|Knutti, 2018]] ; [[#Winsberg--2018|Winsberg, 2018]] ). Assessment of a model’s fitness-for-purpose can be informed both by how the model represents relevant physical processes and by relevant performance metrics ( [[#Baumberger--2017|Baumberger et al., 2017]] ; [[#Parker--2020|Parker, 2020]] ). The processes and metrics that are most relevant can vary with the question of interest. For example, a question about changes in deep-ocean circulation compared with a question about changes in regional precipitation ( [[#Notz--2015|Notz, 2015]] ; [[#Gramelsberger--2020|Gramelsberger et al., 2020]] ). New model-evaluation tools ( [[#1.5.4.5|Section 1.5.4.5]] ) and emergent constraint methodologies ( [[#1.5.4.7|Section 1.5.4.7]] ) can also aid the assessment of fitness-for-purpose, especially in conjunction with process understanding ( [[#Klein--2015|Klein and Hall, 2015]] ; [[#Knutti--2018|Knutti, 2018]] ). The broader availability of large model ensembles may allow for novel tests of fitness that better account for natural climate variability ( [[#1.5.4.2|Section 1.5.4.2]] ). Fitness-for-purpose of models used in this Report is discussed in [[IPCC:Wg1:Chapter:Chapter-3|Chapter 3]] ( [[IPCC:Wg1:Chapter:Chapter-3#3.8.4|Section 3.8.4]] ) for the global scale, in [https://www.ipcc.ch/report/ar6/wg1/chapter/chapter-10 Chapter 10] (Section 10.3) for regional climate, and in the other chapters for the process level.

Typical strategies for enhancing the fitness-for-purpose of a model include increasing resolution in order to explicitly simulate key processes, improving relevant parameterizations, and careful tuning. Changes to a model that enhance its fitness for one purpose can sometimes decrease its fitness for others, by upsetting a pre-existing balance of approximations. When it is unclear whether a model is fit for a purpose of interest, there is often a closely related purpose for which the evidence of fitness is clearer. For example, it might be unclear whether a model is fit for providing highly accurate projections of precipitation changes in a region, but reasonable to think that the model is fit for providing projections of precipitation changes that cannot yet be ruled out ( [[#Parker--2009|Parker, 2009]] ). Such information about plausible or credible changes can be useful to inform adaptation. Note that challenges associated with assessing models’ fitness-for-purpose need not prevent reaching conclusions with high confidence if there are multiple other lines of evidence supporting those same conclusions.

<div id="1.5.4.2" class="h3-container"></div>

<span id="ensemble-modelling-techniques"></span>
==== 1.5.4.2 Ensemble Modelling Techniques ====

<div id="h3-34-siblings" class="h3-siblings"></div>

A key approach in climate science is the comparison of results from multiple model simulations with each other and against observations. These simulations have typically been performed by separate models with consistent boundary conditions and prescribed emissions or radiative forcings, as in the Coupled Model Intercomparison Project phases (CMIP, [[#Meehl--2000|Meehl et al., 2000]] , 2007a; [[#Taylor--2012|Taylor et al., 2012]] ; [[#Eyring--2016|Eyring et al., 2016]] ). Such multi-model ensembles (MMEs) have proven highly useful in sampling and quantifying model uncertainty, within and between generations of climate models. They also reduce the influence on projections of the particular sets of parametrizations and physical components simulated by individual models. The primary usage of MMEs is to provide a well-quantified model range, but when used carefully they can also increase confidence in projections ( [[#Knutti--2010|Knutti et al., 2010]] ). Presently, however, many models also share provenance ( [[#Masson--2011|Masson and Knutti, 2011]] ) and may have common biases that should be acknowledged when presenting and building on MME-derived conclusions ( [[#1.5.4.6|Section 1.5.4.6]] ; [[#Boé--2018|Boé, 2018]] ; [[#Abramowitz--2019|Abramowitz et al., 2019]] ).

Since AR5, an increase in computing power has made it possible to investigate simulated internal variability and to provide robust estimates of forced model responses, using large initial condition ensembles (ICEs), also referred to as single model initial condition large ensembles (SMILEs). Examples using GCMs or ESMs that support assessments in AR6 include the CESM Large Ensemble ( [[#Kay--2015|Kay et al., 2015]] ), the MPI Grand Ensemble ( [[#Maher--2019|Maher et al., 2019]] ), and the CanESM2 large ensembles ( [[#Kirchmeier-Young--2017|Kirchmeier-Young et al., 2017]] ). Such ensembles employ a single GCM or ESM in a fixed configuration, but starting from a variety of different initial states. In some experiments, these initial states only differ slightly. As the climate system is chaotic, such tiny changes in initial conditions lead to different evolutions for the individual realizations of the system as a whole. Other experiments start from a set of well-separated ocean initial conditions to sample the uncertainty in the circulation state of the ocean and its role in longer-time scale variations. These two types of ICEs have been referred to as ‘micro’ and ‘macro’ perturbation ensembles respectively ( [[#Hawkins--2016|Hawkins et al., 2016]] ). In support of this Report, most models contributing to CMIP6 have produced ensembles of multiple realizations of their historical and scenario simulations (Chapters 3 and 4).

Recently, the ICE technique has been extended to atmosphere-only simulations ( [[#Mizuta--2017|Mizuta et al., 2017]] ), single-forcer influences such as volcanic eruptions ( [[#Bethke--2017|Bethke et al., 2017]] ), regional modelling ( [[#Mote--2015|Mote et al., 2015]] ; [[#Fyfe--2017|Fyfe et al., 2017]] ; [[#Schaller--2018|Schaller et al., 2018]] ; [[#Leduc--2019|Leduc et al., 2019]] ), and to attribution of extreme weather events using crowdsourced computing ( [http://climateprediction.net climateprediction.net] ; [[#Massey--2015|Massey et al., 2015]] ).

ICEs can also be used to evaluate climate model parameterizations, if models are initialized appropriately ( [[#Phillips--2004|Phillips et al., 2004]] ; [[#Williams--2013|Williams et al., 2013]] ), mostly within the framework of seamless weather and climate predictions (e.g., [[#Palmer--2008|Palmer et al., 2008]] ; [[#Hurrell--2009|Hurrell et al., 2009]] ; [[#Brown--2012|Brown et al., 2012]] ). Initializing an atmospheric model in hindcast mode and observing the biases as they develop permits testing of the parameterized processes, by starting from a known state rather than one dominated by quasi-random short-term variability ( [[#Williams--2013|Williams et al., 2013]] ; [[#Ma--2014|Ma et al., 2014]] ; [[#Vannière--2014|Vannière et al., 2014]] ). However, single-model initial-conditions ensembles cannot cover the same degrees of freedom as a multi-model ensemble, because model characteristics substantially affect model behaviour ( [[#Flato--2013|Flato et al., 2013]] ).

A third common modelling technique is the perturbed parameter ensemble (PPE; note that the abbreviation also sometimes refers to the sub-category ‘perturbed physics ensemble’). These methods are used to assess uncertainty based on a single model, with individual parameters perturbed to reflect the full range of their uncertainty ( [[#Murphy--2004|Murphy et al., 2004]] ; [[#Knutti--2010|Knutti et al., 2010]] ; [[#Lee--2011|Lee et al., 2011]] ; [[#Shiogama--2014|Shiogama et al., 2014]] ). Statistical methods can then be used to detect which parameters are the main causes of uncertainty across the ensemble. PPEs have been used frequently in simpler models, such as EMICs, and are being applied to more complex models. A caveat of PPEs is that the estimated uncertainty will depend on the specific parameterizations of the underlying model and may well be an underestimation of the ‘true’ uncertainty. It is also challenging to disentangle forced responses from internal variability using a PPE alone.

Together, the three ensemble methods (MMEs, ICEs, PPEs) allow investigation of climate model uncertainty arising from internal variability, initial and internal boundary conditions, model formulations and parameterizations ( [[#Parker--2013|Parker, 2013]] ). Figure 1.21 illustrates the different ensemble types. Recent studies have also started combining multiple ensemble types or using ensembles in combination with statistical analytical techniques. For example, [[#Murphy--2018|Murphy et al. (2018)]] combine MMEs and PPEs to give a fuller assessment of modelling uncertainty. [[#Wagman--2018|Wagman and Jackson (2018)]] use PPEs to evaluate the robustness of MME-based emergent constraints. [[#Sexton--2019|Sexton et al. (2019)]] study the robustness of ICE approaches by identifying parameters and processes responsible for model errors at the two different time scales.

<div id="_idContainer061" class="_idGenObjectStyleOverride-1"></div>

<!-- START IMG -->
<!-- IMG FILE -->
[[File:94c89175c1897131f36d3a673881b375 IPCC_AR6_WGI_Figure_1_21.png]]
<!-- IMG TITLE + CAPTION -->

'''Figure 1.21 |''' '''Illustration of common types of model ensemble, simulating the time evolution of a quantity Q (such as global mean surface temperature).''' '''(a)''' Multi-model ensemble, where each model has its own realization of the processes affecting Q, and its own internal variability around the baseline value (dashed line). The multi-model mean (black) is commonly taken as the ensemble average. '''(b)''' Initial condition ensemble, where several realizations from a single model are compared. These differ only by minute (‘micro’) perturbations to the initial conditions of the simulation, such that over time, internal variability will progress differently in each ensemble member. '''(c)''' Perturbed physics ensemble, which also compares realizations from a single model, but where one or more internal parameters that may affect the simulations of Q are systematically changed to allow for a quantification of the impact of those quantities on the model results. Additionally, each parameter set may be taken as the starting point for an initial condition ensemble. In this figure, each set has three ensemble members.
<!-- END IMG -->

Overall, we assess that increases in computing power and the broader availability of larger and more varied ensembles of model simulations have contributed to better estimations of uncertainty in projections of future change ( ''high confidence'' ). Note, however, that despite their widespread use in climate science today, the cost of the ensemble approach in human and computational resources, and the challenges associated with the interpretation of multi-model ensembles, has been questioned ( [[#Palmer--2019|Palmer and Stevens, 2019]] ; [[#Touzé-Peiffer--2020|Touzé-Peiffer et al., 2020]] ).

<div id="1.5.4.3" class="h3-container"></div>

<span id="the-sixth-phase-of-the-coupled-model-intercomparison-project-cmip6"></span>
==== 1.5.4.3 The Sixth Phase of the Coupled Model Intercomparison Project (CMIP6) ====

<div id="h3-35-siblings" class="h3-siblings"></div>

The Coupled Model Intercomparison Project (CMIP) provides a framework to compare the results of different GCMs or ESMs performing similar experiments. Since its creation in the mid-1990s, it has evolved in different phases, involving all major climate modelling centres in the world (Figure 1.20). The results of these phases have played a key role in previous IPCC reports, and the present Report assesses a range of results from CMIP5 that were not published until after the AR5, as well as the first results of the 6th phase of CMIP (CMIP6; [[#Eyring--2016|Eyring et al., 2016]] ). The CMIP6 experiment design is somewhat different from previous phases. It now consists of a limited set of DECK (Diagnostic, Evaluation and Characterization of Klima) simulations and an historical simulation that must be performed by all participating models, as well as a wide range of CMIP6-Endorsed model intercomparison projects (MIPs) covering specialized topics (Figure 1.22; [[#Eyring--2016|Eyring et al., 2016]] ). Each MIP activity consists of a series of model experiments, documented in the literature (Table 1.3) and in an online database ( [http://es-doc.org es-doc.org] ; Annex II; [[#Pascoe--2020|Pascoe et al., 2020]] ).

<div id="_idContainer063" class="_idGenObjectStyleOverride-1"></div>

<!-- START IMG -->
<!-- IMG FILE -->
[[File:794fa33a3216f334817859f0cf2ff96b IPCC_AR6_WGI_Figure_1_22.png]]
<!-- IMG TITLE + CAPTION -->

'''Figure 1.22 |''' '''Structure of CMIP6, the 6th phase of the Coupled Model Intercomparison Project''' . The centre shows the common DECK (Diagnostic, Evaluation and Characterization of Klima) and historical experiments that all participating models must perform. The outer circles show the topics covered by the endorsed (red) and other MIPs (orange). See Table 1.3 for explanation of the MIP acronyms. Figure is adapted from [[#Eyring--2016|Eyring et al. (2016)]] .
<!-- END IMG -->

<!-- START IMG -->
<!-- TABLE IMG -->
<!-- IMG TITLE + CAPTION -->
 '''Table 1.3 | CMIP6-Endorsed MIPs, their key references, and where they are used or referenced throughout this Report.'''

[[File:80356ae2da8a03e6aeb0d297efbb29ab IPCC_AR6_WGI_Chapter_1_Table_1_3.png]]
<!-- END IMG -->
The CMIP DECK simulations form the basis for a range of assessments and projections in the following chapters. As in CMIP5, they consist of: a ‘pre-industrial’ control simulation (piControl, where ‘pre-industrial’ is taken as fixed 1850 conditions in these experiments); an idealized, abrupt quadrupling of CO <sub>2</sub> concentrations relative to piControl (to estimate equilibrium climate sensitivity); a 1% per year increase in CO <sub>2</sub> concentrations relative to piControl (to estimate the transient climate response); and a transient simulation with prescribed sea-surface temperatures for the period 1979–2014 (termed ‘AMIP’ for historical reasons). In addition, all participating models perform a historical simulation for the period 1850–2014. For the latter, common CMIP6 forcings are prescribed (Cross-Chapter Box 1.4, Table 2). Depending on the model setup, these include emissions and concentrations of short-lived species ( [[#Hoesly--2018|Hoesly et al., 2018]] ; [[#Gidden--2019|Gidden et al., 2019]] ), long-lived GHGs ( [[#Meinshausen--2017|Meinshausen et al., 2017]] ), biomass burning emissions ( [[#van%20Marle--2017|van Marle et al., 2017]] ), global gridded land-use forcing data ( [[#Ma--2020|Ma et al., 2020]] ), solar forcing ( [[#Matthes--2017|Matthes et al., 2017]] ), and stratospheric aerosol data from volcanoes ( [[#Zanchettin--2016|Zanchettin et al., 2016]] ). The methods for generating gridded datasets are described in [[#Feng--2020|Feng et al. (2020)]] . For AMIP simulations, common sea surface temperatures (SSTs) and sea ice concentrations (SICs) are prescribed. For simulations with prescribed aerosol abundances (i.e., not calculated from emissions), optical properties and fractional changes in cloud droplet effective radius are generally prescribed in order to provide a more consistent representation of aerosol forcing relative to earlier CMIP phases ( [[#Fiedler--2017|Fiedler et al., 2017]] ; [[#Stevens--2017|Stevens et al., 2017]] ). For models without ozone chemistry, time-varying gridded ozone concentrations and nitrogen deposition are also provided ( [[#Checa-Garcia--2018|Checa-Garcia et al., 2018]] ).

Beyond the DECK and the historical simulations, the CMIP6-Endorsed MIPs aim to investigate how models respond to specific forcings, their potential systematic biases, their variability, and their responses to detailed future scenarios such as the Shared Socio-economic Pathways (SSPs; [[#1.6|Section 1.6]] ). Table 1.3 lists the 23 CMIP6-Endorsed MIPs and key references. Results from a range of these MIPs, and many others outside of the most recent CMIP6 cycle, will be assessed in the following chapters (also shown in Table 1.3). References to all the CMIP6 datasets used in the report are found in Annex II, Table AII.10.

<div id="1.5.4.4" class="h3-container"></div>

<span id="coordinated-regional-downscaling-experiment-cordex"></span>
==== 1.5.4.4 Coordinated Regional Downscaling Experiment (CORDEX) ====

<div id="h3-36-siblings" class="h3-siblings"></div>

The Coordinated Regional Downscaling Experiment (CORDEX; [[#Gutowski%20Jr.--2016|Gutowski Jr. et al., 2016]] ) is an intercomparison project for regional models and statistical downscaling techniques, coordinating simulations on common domains and under common experimental conditions in a similar way to the CMIP effort. Dynamical and statistical downscaling techniques can provide higher-resolution climate information than is available directly from global climate models (Section 10.3). These techniques require evaluation and quantification of their performance before they can be considered appropriate as usable regional climate information or be used in support of climate services. CORDEX simulations have been provided by a range of regional downscaling models for 14 regions, together covering much of the globe (Figure Atlas.7), and they are used extensively in the AR6 WGI [[IPCC:Wg1:Chapter:Atlas|Atlas]] (Atlas.1.4 and Annex II).

In support of AR6, CORDEX has undertaken a new experiment (CORDEX-CORE) in which regional climate models downscale a common set of global model simulations, performed at a coarser resolution, to a spatial resolution spanning from 12–25 km over most of the CORDEX domains (Box Atlas.1). CORDEX-CORE represents an improved level of coordinated intercomparison of downscaling models ( [[#Remedio--2019|Remedio et al., 2019]] ).

<div id="1.5.4.5" class="h3-container"></div>

<span id="model-evaluation-tools"></span>
==== 1.5.4.5 Model Evaluation Tools ====

<div id="h3-37-siblings" class="h3-siblings"></div>

For the first time in CMIP, a range of comprehensive evaluation tools are now available that can run alongside the commonly used distributed data platform – Earth System Grid Federation (ESGF; see Annex II) – to produce comprehensive results as soon as the model output is published to the CMIP archive.

For instance, the Earth System Model Evaluation Tool (ESMValTool; [[#Eyring--2020|Eyring et al., 2020]] ; [[#Lauer--2020|Lauer et al., 2020]] ; [[#Righi--2020|Righi et al., 2020]] ) is used by a number of chapters. It is an open-source community software tool that includes a large variety of diagnostics and performance metrics relevant for coupled Earth system processes, such as for the mean, variability and trends, and it can also examine emergent constraints ( [[#1.5.4.7|Section 1.5.4.7]] ). ESMValTool also includes routines provided by the WMO Expert Team on Climate Change Detection and Indices for the evaluation of extreme events ( [[#Min--2011|Min et al., 2011]] ; [[#Sillmann--2013|Sillmann et al., 2013]] ) and diagnostics for key processes and variability. Another example of an evaluation tool is the CLIVAR 2020 ENSO metrics package ( [[#Planton--2021|Planton et al., 2021]] ).

These tools are used in several chapters of this report for the creation of the figures that show CMIP results. Together with the Interactive Atlas, they allow for traceability of key results, and an additional level of quality control on whether published figures can be reproduced. It also provides the capability to update published figures with, as much as possible, the same set of models in all figures, and to assess model improvements across different phases of CMIP ( [[IPCC:Wg1:Chapter:Chapter-3#3.8.2|Section 3.8.2]] ).

These new developments are facilitated by the definition of common formats for CMIP model output ( [[#Balaji--2018|Balaji et al., 2018]] ) and the availability of reanalyses and observations in the same format as CMIP output (obs4MIPs; [[#Ferraro--2015|Ferraro et al., 2015]] ). The tools are also used to support routine evaluation at individual model centres and simplify the assessment of improvements in individual models or generations of model ensembles ( [[#Eyring--2019|Eyring et al., 2019]] ). Note, however, that while tools such as ESMValTool can produce an estimate of overall model performance, dedicated model evaluation still needs to be performed when analysing projections for a particular purpose, such as assessing changing hazards in a given region. Such evaluation is discussed in the next section, and in greater detail in later chapters of this Report.

<div id="1.5.4.6" class="h3-container"></div>

<span id="evaluation-of-process-based-models-against-observations"></span>
==== 1.5.4.6 Evaluation of Process-Based Models Against Observations ====

<div id="h3-38-siblings" class="h3-siblings"></div>

Techniques used for evaluating process-based climate models against observations were assessed in AR5 ( [[#Flato--2013|Flato et al., 2013]] ), and have progressed rapidly since ( [[#Eyring--2019|Eyring et al., 2019]] ). The most widely used technique is to compare climatologies (long-term averages of specific climate variables) or time series of simulated (process-based) model output with observations, considering the observational uncertainty. A further approach is to compare the results of process-based models with those from statistical models. In addition to a comparison of climatological means, trends and variability, AR5 already made use of a large set of performance metrics for a quantitative evaluation of the models.

Since AR5, a range of studies has investigated model agreement with observations well beyond large-scale mean climate properties (e.g., [[#Bellenger--2014|Bellenger et al., 2014]] ; [[#Covey--2016|Covey et al., 2016]] ; [[#Pendergrass--2017|Pendergrass and Deser, 2017]] ; [[#Goelzer--2018|Goelzer et al., 2018]] ; [[#Beusch--2020a|Beusch et al., 2020a]] ), providing information on the performance of recent model simulations across multiple variables and components of the Earth system (e.g., [[#Anav--2013|Anav et al., 2013]] ; [[#Guan--2017|Guan and Waliser, 2017]] ). Based on such studies, this Report assesses model improvements across different CMIP DECK, CMIP6 historical and CMIP6-Endorsed MIP simulations, and of differences in model performance between different classes of models, such as high- versus low-resolution models (see e.g., [[IPCC:Wg1:Chapter:Chapter-3#3.8.2|Section 3.8.2]] ).

In addition, process- or regime-oriented evaluation of models has been expanded since AR5. By focusing on processes, causes of systematic errors in the models can be identified and insights can be gained as to whether a mean state or trend is correctly simulated and for the right reasons. This approach is commonly used for the evaluation of clouds (e.g., [[#Williams--2009|Williams and Webb, 2009]] ; [[#Konsta--2012|Konsta et al., 2012]] ; [[#Bony--2015|Bony et al., 2015]] ; [[#Dal%20Gesso--2015|Dal Gesso et al., 2015]] ; [[#Jin--2017|Jin et al., 2017]] ), dust emissions (e.g., [[#Parajuli--2016|Parajuli et al., 2016]] ; [[#Wu--2016|Wu et al., 2016]] ) as well as aerosol–cloud (e.g., [[#Gryspeerdt--2012|Gryspeerdt and Stier, 2012]] ) and chemistry–climate ( [[#SPARC--2010|SPARC, 2010]] ) interactions. Process-oriented diagnostics have also been used to evaluate specific phenomena such as the El Niño–Southern Oscillation (ENSO; [[#Guilyardi--2016|Guilyardi et al., 2016]] ), the Madden–Julian Oscillation (MJO; [[#Ahn--2017|Ahn et al., 2017]] ; [[#Jiang--2018|Jiang et al., 2018]] ), Southern Ocean clouds ( [[#Hyder--2018|Hyder et al., 2018]] ), monsoons ( [[#Boo--2011|Boo et al., 2011]] ; [[#James--2015|James et al., 2015]] ) and tropical cyclones ( [[#Kim--2018|Kim et al., 2018]] ).

Instrument simulators provide estimates of what a satellite would see if looking down on the model-simulated planet, and improve the direct comparison of modelled variables such as clouds, precipitation and upper tropospheric humidity with observations from satellites (e.g., [[#Kay--2011|Kay et al., 2011]] ; [[#Klein--2013|Klein et al., 2013]] ; [[#Cesana--2016|Cesana and Waliser, 2016]] ; [[#Konsta--2016|Konsta et al., 2016]] ; [[#Jin--2017|Jin et al., 2017]] ; [[#Chepfer--2018|Chepfer et al., 2018]] ; [[#Swales--2018|Swales et al., 2018]] ; [[#Zhang--2018|Zhang et al., 2018]] ). Within the framework of the Cloud Feedback Model Intercomparison Project (CFMIP) contribution to CMIP6 ( [[#Webb--2017|Webb et al., 2017]] ), a new version of the Cloud Feedback Model Intercomparison Project Observational Simulator (COSP; [[#Swales--2018|Swales et al., 2018]] ) has been released which makes use of a collection of observation proxies or satellite simulators. Related approaches in this rapidly evolving field include simulators for Arctic Ocean observations ( [[#Burgard--2020|Burgard et al., 2020]] ) and measurements of aerosol observations along aircraft trajectories ( [[#Watson-Parris--2019|Watson-Parris et al., 2019]] ).

In this Report, model evaluation is performed in the individual chapters, rather than in a separate chapter as was the case for AR5. This applies to the model types discussed above, and also to dedicated models of subsystems that are not (or not yet) part of usual climate models, for example, glacier or ice-sheet models (Annex II). Further discussions are found in [[IPCC:Wg1:Chapter:Chapter-3|Chapter 3]] (attribution), [[IPCC:Wg1:Chapter:Chapter-5|Chapter 5]] (carbon cycle), [[IPCC:Wg1:Chapter:Chapter-6|Chapter 6]] (short-lived climate forcers), [[IPCC:Wg1:Chapter:Chapter-8|Chapter 8]] (water cycle), [[IPCC:Wg1:Chapter:Chapter-9|Chapter 9]] (ocean, cryosphere and sea level), [https://www.ipcc.ch/report/ar6/wg1/chapter/chapter-10 Chapter 10] (regional scale information) and the [[IPCC:Wg1:Chapter:Atlas|Atlas]] (regional models).

<div id="1.5.4.7" class="h3-container"></div>

<span id="emergent-constraints-on-climate-feedbacks-sensitivities-and-projections"></span>
==== 1.5.4.7 Emergent Constraints on Climate Feedbacks, Sensitivities and Projections ====

<div id="h3-39-siblings" class="h3-siblings"></div>

An emergent constraint is the relationship between an uncertain aspect of future climate change and an observable feature of the Earth System, evident across an ensemble of models ( [[#Allen--2002|Allen and Ingram, 2002]] ; [[#Mystakidis--2016|Mystakidis et al., 2016]] ; [[#Wenzel--2016|Wenzel et al., 2016]] ; [[#Hall--2019|Hall et al., 2019]] ; [[#Winkler--2019|Winkler et al., 2019]] ). Complex Earth system models (ESMs) simulate variations on time scales from hours to centuries, telling us how aspects of the current climate relate to its sensitivity to anthropogenic forcing. Where an ensemble of different ESMs displays a relationship between a short-term observable variation and a longer-term sensitivity, an observation of the short-term variation in the real world can be converted, via the model-based relationship, into an ‘emergent constraint’ on the sensitivity. This is shown schematically in Figure 1.23 (see Glossary; [[#Eyring--2019|Eyring et al., 2019]] ).

<div id="_idContainer065" class="_idGenObjectStyleOverride-1"></div>

<!-- START IMG -->
<!-- IMG FILE -->
[[File:19c9c2bf553e21560024961e7c247bd8 IPCC_AR6_WGI_Figure_1_23.png]]
<!-- IMG TITLE + CAPTION -->

'''Figure 1.23 |''' '''The principle of emergent constraints''' . An ensemble of models (blue dots) defines a relationship between an observable mean, trend or variation in the climate (x-axis) and an uncertain projection, climate sensitivity or feedback (y-axis). An observation of the x-axis variable can then be combined with the model-derived relationship to provide a tighter estimate of the climate projection, sensitivity or feedback on the y-axis. Figure adapted from [[#Eyring--2019|Eyring et al. (2019)]] .
<!-- END IMG -->

Emergent constraints use the spread in model projections to estimate the sensitivities of the climate system to anthropogenic forcing, providing another type of ensemble-wide information that is not readily available from simulations with one ESM alone. As emergent constraints depend on identifying those observable aspects of the climate system that are most related to climate projections, they also help to focus model evaluation on the most relevant observations ( [[#Hall--2019|Hall et al., 2019]] ). However, there is a chance that indiscriminate data-mining of the multi-dimensional outputs from ESMs could lead to spurious correlations ( [[#Caldwell--2014|Caldwell et al., 2014]] ; [[#Wagman--2018|Wagman and Jackson, 2018]] ) and less-than-robust emergent constraints on future changes ( [[#Bracegirdle--2013|Bracegirdle and Stephenson, 2013]] ). To avoid this, emergent constraints need to be tested ‘out of sample’ on parts of the dataset that were not included in its construction ( [[#Caldwell--2018|Caldwell et al., 2018]] ) and should also always be based on sound physical understanding and mathematical theory ( [[#Hall--2019|Hall et al., 2019]] ). Their conclusions should also be reassessed when a new generation of MMEs becomes available, such as CMIP6. As an example, [[IPCC:Wg1:Chapter:Chapter-7|Chapter 7]] (Section 7.5.4) discusses and assesses recent studies where equilibrium climate sensitivities (ECS) diagnosed in a multi-model ensemble are compared with the same models’ estimates of an observable quantity, such as post-1970s global warming or tropical sea surface temperatures of past climates like the Last Glacial Maximum or the Pliocene. Assessments of other emergent constraints appear throughout later chapters, such as [[IPCC:Wg1:Chapter:Chapter-4|Chapter 4]] ( [[IPCC:Wg1:Chapter:Chapter-4#4.2.5|Section 4.2.5]] ), [[IPCC:Wg1:Chapter:Chapter-5|Chapter 5]] (Section 5.4.6) and [[IPCC:Wg1:Chapter:Chapter-7|Chapter 7]] (Section 7.5.4).

<div id="1.5.4.8" class="h3-container"></div>

<span id="weighting-techniques-for-model-comparisons"></span>
==== 1.5.4.8 Weighting Techniques for Model Comparisons ====

<div id="h3-40-siblings" class="h3-siblings"></div>

Assessments of climate model ensembles have commonly assumed that each individual model is of equal value (‘model democracy’) and when combining simulations to estimate the mean and variance of quantities of interest, they are typically unweighted ( [[#Haughton--2015|Haughton et al., 2015]] ). This practice has been noted to diminish the influence of models exhibiting a good match with observations ( [[#Tapiador--2020|Tapiador et al., 2020]] ). However, exceptions to this approach exist, notably AR5 projections of sea ice, which only selected a few models which passed a model performance assessment ( [[#Collins--2013|Collins et al., 2013]] ), and more studies on this topic have appeared since AR5 (e.g., [[#Eyring--2019|Eyring et al., 2019]] ). Ensembles are typically sub-selected by removing either poorly performing model simulations ( [[#McSweeney--2015|McSweeney et al., 2015]] ) or model simulations that are perceived to add little additional information, typically where multiple simulations have come from the same model. They may also be weighted based on model performance.

Several recent studies have attempted to quantify the effect of various strategies for selection or weighting of ensemble members based on some set of criteria ( [[#Haughton--2015|Haughton et al., 2015]] ; [[#Olonscheck--2017|Olonscheck and Notz, 2017]] ; [[#Sanderson--2017|Sanderson et al., 2017]] ). Model weighting strategies have been further employed since AR5 to reduce the spread in climate projections for a given scenario by using weights based on one or more model performance metrics ( [[#Wenzel--2016|Wenzel et al., 2016]] ; [[#Knutti--2017|Knutti et al., 2017]] ; [[#Sanderson--2017|Sanderson et al., 2017]] ; [[#Lorenz--2018|Lorenz et al., 2018]] ; [[#Liang--2020|Liang et al., 2020]] ). However, models may share representations of processes, parameterization schemes, or even parts of code, leading to common biases. The models may therefore not be fully independent, calling into question inferences derived from multi-model ensembles ( [[#Abramowitz--2019|Abramowitz et al., 2019]] ). Emergent constraints ( [[#1.5.4.5|Section 1.5.4.5]] ) also represent an implicit weighting technique that explicitly links present performance to future projections ( [[#Bracegirdle--2013|Bracegirdle and Stephenson, 2013]] ).

Concern has been raised about the large extent to which code is shared within the CMIP5 multi-model ensemble ( [[#Sanderson--2015a|Sanderson et al., 2015a]] ). [[#Boé--2018|Boé (2018)]] showed that a clear relationship exists between the number of components shared by climate models and how similar the simulations are. The resulting similarities in behaviour need to be accounted for in the generation of best-estimate multi-model climate projections. This has led to calls to move beyond equally-weighted multi-model means towards weighted means that take into account both model performance and model independence ( [[#Sanderson--2015b|Sanderson et al., 2015b]] , 2017; [[#Knutti--2017|Knutti et al., 2017]] ). Model independence has been defined in terms of performance differences within an ensemble ( [[#Masson--2011|Masson and Knutti, 2011]] ; [[#Knutti--2013|Knutti et al., 2013]] , 2017, [[#Sanderson--2015a|Sanderson et al., 2015a]] , b, 2017; [[#Lorenz--2018|Lorenz et al., 2018]] ). However, this definition is sensitive to the choice of variable, observational dataset, metric, time period, and region, and a performance-ranked ensemble has been shown to sometimes perform worse than a random selection ( [[#Herger--2018a|Herger et al., 2018a]] ). The adequacy of the constraint provided by the data and experimental methods can be tested using a ‘calibration-validation’ style partitioning of observations into two sets ( [[#Bishop--2013|Bishop and Abramowitz, 2013]] ), or a ‘perfect model approach’ where one of the ensemble members is treated as the reference dataset and all model weights are calibrated against it ( [[#Bishop--2013|Bishop and Abramowitz, 2013]] ; [[#Wenzel--2016|Wenzel et al., 2016]] ; [[#Knutti--2017|Knutti et al., 2017]] ; [[#Sanderson--2017|Sanderson et al., 2017]] ; [[#Herger--2018a|Herger et al., 2018a]] , b). [[#Sunyer--2014|Sunyer et al. (2014)]] use a Bayesian framework to account for model dependencies and changes in model biases. [[#Annan--2017|Annan and Hargreaves (2017)]] provides a statistical, quantifiable definition of independence that is independent of performance-based measures.

The AR5 quantified uncertainty in CMIP5 climate projections by selecting one realization per model per scenario, and calculating the 5–95% range of the resulting ensemble (Box 4.1) and the same strategy is generally still used in AR6. Broadly, the following chapters take the CMIP6 5–95% ensemble range as the ''likely'' uncertainty range for projections, <sup>[[#footnote-000|8]]</sup> with no further weighting or consideration of model ancestry and as long as no universal, robust method for weighting a multi-model projection ensemble is available (Box 4.1). A notable exception to this approach is the assessment of future changes in global surface air temperature (GSAT), which also draws on the updated best estimate and range of equilibrium climate sensitivity assessed in Chapter 7. For a thorough description of the model-weighting choices made in this Report, and the assessment of GSAT, see [[IPCC:Wg1:Chapter:Chapter-4|Chapter 4]] (Box 4.1). Model selection and weighting in downscaling approaches for regional assessment is discussed in [https://www.ipcc.ch/report/ar6/wg1/chapter/chapter-10 Chapter 10] (Section 10.3.4).

<div id="1.6" class="h1-container"></div>

<span id="dimensions-of-integration-scenarios-global-warming-levels-and-cumulative-carbon-emissions"></span>