In my post
The CMIP5 – Coupled Model Intercomparison Project Phase 5 is an integral part of the upcoming IPCC assessment. Two of its goals are to
- evaluate how realistic the models are in simulating the recent past,
- provide projections of future climate change on two time scales, near term (out to about 2035) and long term (out to 2100 and beyond)
In my post, CMIP5 Climate Model Runs – A Scientifically Flawed Approach, I presented a number of peer-reviewed model comparisons with real world observations that documents the failure of the multi-decadal global climate models to provide skillful regional climate predictions to the impacts communities.
I concluded my post with the text
These studies, and I am certain more will follow, show that the multi-decadal climate models are not even skillfully simulating current climate statistics, as are needed by the impacts communities, much less CHANGES in climate statistics. At some point, this waste of money to make regional climate predictions decades from now is going to be widely recognized.
Jos de Laat of KNMI has provided us with further examples that document the serious limitation of the CMIP5 model results. I have presented this list below [with highlighting]. I am pleased that the model hindcast predictions are being reported, as this is clearly information that the impact and policy communities need.
L. Goddard, A. Kumar, A. Solomon, D. Smith, G. Boer, P. Gonzalez, V. Kharin, W. Merryfield, C. Deser, S. J. Mason, B. P. Kirtman, R. Msadek, R. Sutton, E. Hawkins, T. Fricker, G. Hegerl, C. A. T. Ferro, D. B. Stephenson, G. A. Meehl, T. Stockdale, R. Burgman, A. M. Greene, Y. Kushnir, M. Newman, J. Carton, I. Fukumori, T. Delworth. (2012) A verification framework for interannual-to-decadal predictions experiments. Climate Dynamics Online publication date: 24-Aug-2012.
Decadal predictions have a high profile in the climate science community and beyond, yet very little is known about their skill. Nor is there any agreed protocol for estimating their skill. This paper proposes a sound and coordinated framework for verification of decadal hindcast experiments. The framework is illustrated for decadal hindcasts tailored to meet the requirements and specifications of CMIP5 (Coupled Model Intercomparison Project phase 5). The chosen metrics address key questions about the information content in initialized decadal hindcasts. These questions are: (1) Do the initial conditions in the hindcasts lead to more accurate predictions of the climate, compared to un-initialized climate change projections? and (2) Is the prediction model’s ensemble spread an appropriate representation of forecast uncertainty on average? The first question is addressed through deterministic metrics that compare the initialized and uninitialized hindcasts. The second question is addressed through a probabilistic metric applied to the initialized hindcasts and comparing different ways to ascribe forecast uncertainty. Verification is advocated at smoothed regional scales that can illuminate broad areas of predictability, as well as at the grid scale, since many users of the decadal prediction experiments who feed the climate data into applications or decision models will use the data at grid scale, or downscale it to even higher resolution. An overall statement on skill of CMIP5 decadal hindcasts is not the aim of this paper. The results presented are only illustrative of the framework, which would enable such studies. However, broad conclusions that are beginning to emerge from the CMIP5 results include (1) Most predictability at the interannual-to-decadal scale, relative to climatological averages, comes from external forcing, particularly for temperature; (2) though moderate, additional skill is added by the initial conditions over what is imparted by external forcing alone; however, the impact of initialization may result in overall worse predictions in some regions than provided by uninitialized climate change projections; (3) limited hindcast records and the dearth of climate-quality observational data impede our ability to quantify expected skill as well as model biases; and (4) as is common to seasonal-to-interannual model predictions, the spread of the ensemble members is not necessarily a good representation of forecast uncertainty. The authors recommend that this framework be adopted to serve as a starting point to compare prediction quality across prediction systems. The framework can provide a baseline against which future improvements can be quantified. The framework also provides guidance on the use of these model predictions, which differ in fundamental ways from the climate change projections that much of the community has become familiar with, including adjustment of mean and conditional biases, and consideration of how to best approach forecast uncertainty.
Driscoll, S., A. Bozzo, L. J. Gray, A. Robock, and G. Stenchikov (2012), Coupled Model Intercomparison Project 5 (CMIP5) simulations of climate following volcanic eruptions, J. Geophys. Res., 117, D17105, doi:10.1029/2012JD017607. published 6 September 2012.
The ability of the climate models submitted to the Coupled Model Intercomparison Project 5 (CMIP5) database to simulate the Northern Hemisphere winter climate following a large tropical volcanic eruption is assessed. When sulfate aerosols are produced by volcanic injections into the tropical stratosphere and spread by the stratospheric circulation, it not only causes globally averaged tropospheric cooling but also a localized heating in the lower stratosphere, which can cause major dynamical feedbacks. Observations show a lower stratospheric and surface response during the following one or two Northern Hemisphere (NH) winters, that resembles the positive phase of the North Atlantic Oscillation (NAO). Simulations from 13 CMIP5 models that represent tropical eruptions in the 19th and 20th century are examined, focusing on the large-scale regional impacts associated with the large-scale circulation during the NH winter season. The models generally fail to capture the NH dynamical response following eruptions. They do not sufficiently simulate the observed post-volcanic strengthened NH polar vortex, positive NAO, or NH Eurasian warming pattern, and they tend to overestimate the cooling in the tropical troposphere. The findings are confirmed by a superposed epoch analysis of the NAO index for each model. The study confirms previous similar evaluations and raises concern for the ability of current climate models to simulate the response of a major mode of global circulation variability to external forcings. This is also of concern for the accuracy of geoengineering modeling studies that assess the atmospheric response to stratosphere-injected particles.
Mauritsen, T., et al. (2012), Tuning the climate of a global model, J. Adv. Model. Earth Syst., 4, M00A01, doi:10.1029/2012MS000154. published 7 August 2012.
During a development stage global climate models have their properties adjusted or tuned in various ways to best match the known state of the Earth’s climate system. These desired properties are observables, such as the radiation balance at the top of the atmosphere, the global mean temperature, sea ice, clouds and wind fields. The tuning is typically performed by adjusting uncertain, or even non-observable, parameters related to processes not explicitly represented at the model grid resolution. The practice of climate model tuning has seen an increasing level of attention because key model properties, such as climate sensitivity, have been shown to depend on frequently used tuning parameters. Here we provide insights into how climate model tuning is practically done in the case of closing the radiation balance and adjusting the global mean temperature for the Max Planck Institute Earth System Model (MPI-ESM). We demonstrate that considerable ambiguity exists in the choice of parameters, and present and compare three alternatively tuned, yet plausible configurations of the climate model. The impacts of parameter tuning on climate sensitivity was less than anticipated.
Jiang, J. H., et al. (2012), Evaluation of cloud and water vapor simulations in CMIP5 climate models using NASA “A-Train” satellite observations, J. Geophys. Res., 117, D14105, doi:10.1029/2011JD017237. published 18 July 2012.
Using NASA’s A-Train satellite measurements, we evaluate the accuracy of cloud water content (CWC) and water vapor mixing ratio (H2O) outputs from 19 climate models submitted to the Phase 5 of Coupled Model Intercomparison Project (CMIP5), and assess improvements relative to their counterparts for the earlier CMIP3. We find more than half of the models show improvements from CMIP3 to CMIP5 in simulating column-integrated cloud amount, while changes in water vapor simulation are insignificant. For the 19 CMIP5 models, the model spreads and their differences from the observations are larger in the upper troposphere (UT) than in the lower or middle troposphere (L/MT). The modeled mean CWCs over tropical oceans range from ∼3% to ∼15× of the observations in the UT and 40% to 2× of the observations in the L/MT. For modeled H2Os, the mean values over tropical oceans range from ∼1% to 2× of the observations in the UT and within 10% of the observations in the L/MT. The spatial distributions of clouds at 215 hPa are relatively well-correlated with observations, noticeably better than those for the L/MT clouds. Although both water vapor and clouds are better simulated in the L/MT than in the UT, there is no apparent correlation between the model biases in clouds and water vapor. Numerical scores are used to compare different model performances in regards to spatial mean, variance and distribution of CWC and H2O over tropical oceans. Model performances at each pressure level are ranked according to the average of all the relevant scores for that level.
From the conclusions: “Tropopause layer water vapor is poorly simulated with respect to observations. This likely results from temperature biases.”
Sakaguchi, K., X. Zeng, and M. A. Brunke (2012), The hindcast skill of the CMIP ensembles for the surface air temperature trend, J. Geophys. Res., 117, D16113, doi:10.1029/2012JD017765. published 28 August 2012
Linear trends of the surface air temperature (SAT) simulated by selected models from the Coupled Model Intercomparison Project (CMIP3 and CMIP5) historical experiments are evaluated using observations to document (1) the expected range and characteristics of the errors in hindcasting the ‘change’ in SAT at different spatiotemporal scales, (2) if there are ‘threshold’ spatiotemporal scales across which the models show substantially improved performance, and (3) how they differ between CMIP3 and CMIP5. Root Mean Square Error, linear correlation, and Brier score show better agreement with the observations as spatiotemporal scale increases but the skill for the regional (5° × 5° – 20° × 20° grid) and decadal (10 – ∼30-year trends) scales is rather limited. Rapid improvements are seen across 30° × 30° grid to zonal average and around 30 years, although they depend on the performance statistics. Rather abrupt change in the performance from 30° × 30° grid to zonal average implies that averaging out longitudinal features, such as land-ocean contrast, might significantly improve the reliability of the simulated SAT trend. The mean bias and ensemble spread relative to the observed variability, which are crucial to the reliability of the ensemble distribution, are not necessarily improved with increasing scales and may impact probabilistic predictions more at longer temporal scales. No significant differences are found in the performance of CMIP3 and CMIP5 at the large spatiotemporal scales, but at smaller scales the CMIP5 ensemble often shows better correlation and Brier score, indicating improvements in the CMIP5 on the temporal dynamics of SAT at regional and decadal scales.