Processing 90 TB Historical Weather Data
Integrating ECMWF IFS historical weather model data from 2017 onwards
tl;dr: The Open-Meteo Historical Weather API now integrates the ECMWF IFS model at full 9-km resolution with data from 2017 onwards!
A little over a year ago, in August 2022, Open-Meteo introduced the historical weather API with a 25-km resolution, based on the ERA5 dataset. Accomplishing this presented challenges, including designing an effective compression and storage system to condense 47 terabytes of data into just 1.2 terabytes.
Open-Meteo has continued to improve its historical weather data. Six months later, the integration of the 11-km ERA5-Land and the 5-km CERRA dataset expanded the data size by nearly threefold. This extension improved the resolution for land surface weather variables, although certain weather variables, like precipitation, wind speed, and solar irradiance, remained limited to 25-km resolution.
To further enhance the accuracy and resolution of past weather data, Open-Meteo now integrates the ECMWF IFS weather model at a full 9-km resolution. This may not seem like a substantial leap from the previous 11/25-km resolution, but it required the processing of nearly 20 terabytes of data from tape, compression, and seamless integration into an accessible API.
ERA5 and ECMWF IFS
Open-Meteo has been using reanalysis weather data from ERA5 and ERA5-Land. Both reanalyses employ weather station, aircraft, buoy, radar, and satellite observations, combined with numerical weather models, to create global datasets of past weather conditions. The ERA5 reanalysis has closely integrated with the ECMWF IFS weather model for consistency over the years.
Creating such records is a big challenge because it involves gathering detailed past weather information from 1940 onwards and using numerical weather models. ERA5, in particular, has done a great job at this.
ERA5 and the ECMWF IFS weather model work closely together. In fact, ERA5 has been using the ECMWF IFS model since 2016 to create its records, and they've continued using it for the years ahead. Using the same "old" weather model is important because it helps keep ERA5's results consistent over weeks, years, and decades. If they used a newer model, it would introduce subtle changes to the data that could affect studies of climate change.
However, the IFS weather model has improved a lot over the past seven years and can now make more accurate weather predictions. You might have noticed this if you've compared ERA5 data to recent observations and seen differences in rainfall or wind speed.
More Accurate Past Weather with ECMWF IFS
To obtain the most precise historical weather information, it's essential to use the latest and most advanced weather models. This is precisely what Open-Meteo is now accomplishing through the historical weather API.
To create a comprehensive dataset, Open-Meteo has gathered and combined all the IFS weather forecasts from 2017 onwards into a continuous record that covers locations all around the world.
Weather models rely on data from diverse sources, such as weather stations, aircraft, buoys, radar, and satellites, to establish the initial state of the Earth's atmosphere. This process, known as data assimilation, is performed at regular intervals, typically at 0:00, 6:00, 12:00, and 18:00 UTC time. It's important to note that while weather models cannot predict the future with absolute precision, they often provide quite accurate short-term forecasts for the current day or the next few hours, depending on the prevailing weather conditions.
By combining simulations from these regularly updated initial conditions, a long time series is generated, closely aligned with real observations. This approach ensures access to weather data for every location on Earth, even in the absence of a nearby weather station.
Importantly, this method is not limited to converting the new 9-km ECMWF IFS data into a weather time-series; it has also been applied to datasets like ERA5, ERA5-Land, and other weather reanalyses.
Available with 2 Days Delay
The historical weather API now includes the 9-km IFS model, in addition to the ERA5 reanalysis. This update provides hourly data for 30 weather variables starting from 2017, all maintained at the original 9-km resolution.
In terms of data availability, ERA5 has a 5-day delay, while the IFS model has reduced this to only a 2-day delay. This is an area of ongoing improvement, and future updates may further reduce these delays. For accessing data from the past 1 or 2 days, the Forecast API with the Past-Days parameter can be used.
When comparing data from ERA5, ERA5-Land, and IFS, only minor differences can be observed. Notably, the resolution difference between ERA5 (25-km) and the 10-km datasets becomes apparent. As an example, a temperature comparison for Berlin, Germany is presented, showing some disparities but, overall, a similarity between ERA5 and IFS data.
These similarities are unsurprising, as both ERA5 and IFS employ very similar data assimilation processes. The IFS data from 2017 onwards serves as an enhancement of existing historical weather data, aligning with Open-Meteo's continuous mission to provide the best available data to the public.
All data is shared openly under the CC-BY-4.0 license, and Open-Meteo is committed to supporting researchers in their use of historical weather data. The growing number of publications utilizing Open-Meteo's data is a testament to the evolving ecosystem that facilitates the open and easy sharing of meteorological data.
Stats for Nerds
The historical weather API continues to expand its database. Processing this amount of data and efficiently delivering it through an API is a great engineering challenge with countless of riddles to solve. For some inexplicable reason, an increasing amount of data keeps finding its way onto my hard drives ;-)
With a total of 18.9 terabytes of IFS data from 2017 onwards, it may not be extraordinarily high, but it's still quite impressive, averaging 2.7 terabytes per year over just seven years. In the grand scheme of things, with 90 terabytes, it brings us close to the 100 terabyte milestone, and there's no doubt that another dataset is on the horizon for the future.
| Dataset | Per Year | Years | Total | OM Compressed |
|-----------|----------|-------|---------|---------------|
| ERA5 | 360 GB | 83 | 30 TB | 5.1 TB |
| ERA5-Land | 420 GB | 73 | 30.6 TB | 3.0 TB |
| CERRA | 286 GB | 37 | 10.5 TB | 2.2 TB |
| IFS | 2'700 GB | 7 | 18.9 TB | 3.2 TB |
Note: Data sizes are refer to compressed GRIB files, typically achieving a compression ratio of approximately 2.7x. The term "OM Compressed" refers to the final database, after conversion of data into a time-series format and the subsequent compression process.
Downloading and handling this volume of data is no small feat, taking several months to complete. While each dataset is readily accessible as open data for research purposes, they are physically stored on tapes at national weather service facilities. Retrieving and downloading this data is a time-consuming and prolonged procedure. The original GRIB files contain data for the entire world at a single time step. Regardless of the scope of one's research area, even if it encompasses only a single location, users are required to download the entire dataset and perform subsequent processing.
Open-Meteo, on the other hand, transforms meteorological GRIB data into time-series data tailored to individual locations, enabling access to specific, smaller portions of the data. This approach not only saves time but also conserves computational resources since only the necessary data is processed.
To accomplish this, Open-Meteo employs its own file format and specialized compression techniques. All the source code is available on GitHub and is developed using Swift and C. This allows other users to download and store weather reanalysis data like ERA5, as well as real-time forecasts from various national weather services. This system is complemented by an HTTP REST server that offers easy access to the data in JSON, CSV, or XLSX formats. Running your own Open-Meteo API instance and maintaining the multiple of weather model data can be a challenging endeavour. However, if you are interested in archiving your weather data, it serves as an excellent tool for the task.
What’s Next?
Looking further into the future, there's an exciting development on the horizon – the creation of ERA6, the successor to ERA5. Although there's limited information available at the moment, the Copernicus project is actively engaged in developing this next-generation reanalysis. It's worth noting that it may take several years before ERA6 becomes available ;-)
The upcoming blog post will delve into the new FlatBuffers format and APIs, allowing for the convenient transformation of data into your Python Panda Data Frame or the efficient integration of weather data into various devices, such as smartphones and IoT devices, with minimal power consumption. Be sure to subscribe so you don't miss it!
Keep up the good work
Did you try storing the data in specialized time series databases such as VictoriaMetrics or ClickHouse? These databases may provide higher compression ratio for the weather data stored on disk compared to custom schemes. They also provide query languages optimized for typical queries over time series data such as MetricsQL - https://docs.victoriametrics.com/metricsql/
See, for example, a benchmark for ingesting 500 billion of samples into VictoriaMetrics - https://valyala.medium.com/billy-how-victoriametrics-deals-with-more-than-500-billion-rows-e82ff8f725da