Linux Powers Giant Database: 400TB of Climate Data and Counting
Linux and Climate Change
Climate change is perhaps the biggest issue facing humanity today – but one vital part of the puzzle is to calculate what exactly the future possibilities are. The Millennium Simulations project – now almost a year old – aims to tackle that issue, generating data which will in due course be used by the Intergovernmental Panel on Climate Change, the international scientific body which is generally regarded as the authoritative voice on the effects and risks of climate change. The Millenium Simulations creates models of the earth's climate, looking at the influence of human activity (including carbon generation and changes in land use), together with natural activity such as volcanoes, to project future climatic changes. It's the first model which uses an interactive carbon cycle, and it also has various sub-models available, looking at land, ocean, and atmosphere.
As you might guess, this project generates a lot of data (over 50TB to date) with repeated runs under various conditions. There are numerous other climate modelling projects out there, as well, all generating large quantities of data which is useful to climate scientists internationally. Dealing with storage of and access to this data is the World Data Center for Climate (WDCC), part of the World Data System. It's an international archive and data distribution center for climate research data (based at based at Model and Data, in Hamburg, with the co-operation of the German Climate Computing Centre). Its aim, as with all the WDS datacenters, is to give the international scientific community the best possible access to as much data as is possible.
The WDCC's main focus is on climate modelling and data products, rather than on raw data. Several research projects, including the IPCC's Data Distribution Centre, use the WDCC to help distribute data between project partners, and to users globally, both during and after the projects. This makes it an incredibly important resource for scientists working on climate modelling and trying to predict potential future climate changes and their effects. And it's all based on UNIX/Linux.
Underlying the WDCC's data archive is the CERA-2 (Climate and Environmental Retrieving and Archiving) database system, one of the world's largest databases. As of late 2008, it has around 400 TB of data, and nearly a thousand named users. Running on NEC and Solaris machines, it's a federated ORACLE database connected to a set of StorageTek tape libraries hosted at DKRZ (the tape libraries themselves have a 60 PB capacity, but they store data other than that belonging to the WDCC). The database itself is a distributed system across multiple NEC machines running Linux.