The CMIP3 data led by the World Climate Research Programme’s (WCRP’s) Working Group on Coupled Modelling is used extensively for climate model studies. Very recently Marcia Wyatt, as part of her contining studies building on her paper
Wyatt, Marcia Glaze , Sergey Kravtsov, and Anastasios A. Tsonis, 2011: Atlantic Multidecadal Oscillation and Northern Hemisphere’s climate variability Climate Dynamics: DOI: 10.1007/s00382-011-1071-8. (see also her guest weblog on the paper here)
found problems with this data. I asked her to summarize her experience for our weblog. Her summary is given below.
CMIP3 experience by Marcia Wyatt
CMIP3 (Coupled Model Intercomparison Project) provides a free and open archive of recent model output (netcdf files) for use by researchers. While convenient, it is not immune from data-crunching-induced complacency. Following is a cautionary tale.
My current research project involves processing CMIP model-datasets, converting “raw” variables into climate indices, and then applying statistical analysis to these reconstructed indices. The process has not been straight forward. With each new set of model data come new problems. For my particular project, slight differences in the formats of each CMIP model-dataset have required modification of computer coding.
Initial phases of a venture present the steepest learning curves; working with the CMIP data has been no exception. But, with each successful processing and analysis of a dataset, my confidence in the various computer codes scripted for the specific dataset characteristics grew. This trend was not to last.
Early last week (June 21), as the last known glitch was being addressed, allowing progress to be made on several more not-yet-completed datasets, an oddity came to light. As I was putting the processed data – the reconstructed indices – through preparation steps for statistical analysis, I realized the processed values were odd. There were lots of repeated numbers, but with no discernable pattern. At first I suspected the codes; after all, I had encountered problems with each new dataset before. But the inconsistency of performance of the codes on similarly formatted data implied the problem lay elsewhere.
Before concluding the problem lay elsewhere, I looked at the processed data – the “raw” data “read” from the netcdf files. Clusters of zeroes filled certain regions of the huge matrices, but not all. Still I was not convinced beyond a doubt that this reflected a problem. I adopted a different strategy – to re-do the four model datasets already successfully completed. This took me back to square one. I selected data from the CMIP database, downloaded the needed data files, requested their transfer to my email address, and awaited their arrival. If I could repeat the analysis on these data with success, I would deduce the problem was with me, not with the data.
The emailed response from CMIP arrived. Instead of data files, I received messages that these files were unavailable. This was nothing new. I had seen this message on some requested data files before (before mid-June). At that time, I simply re-directed to a different model run, not suspecting an evolving problem. But this time was different. I had downloaded and processed these data successfully in the past. Now I was told they were unavailable. This made no sense.
I contacted the CMIP organization. They must have just discovered the problem, themselves. Within a day, the site was frozen. Users of the site were notified that all data downloaded since June 15th were unreliable. (I had downloaded the problem-ridden data on the 16th.) The message on the CMIP site since has been updated to include a projected resolution date of mid-July. Lesson here – confidence should be embraced with caution.