Open-Meteo is excited to announce the launch of our new Historical Forecast API. Building on the foundation of our existing weather forecast offerings, this new API provides access to archived high-resolution weather forecast data, making it an invaluable resource for data scientists, researchers, and developers.
What is the Historical Forecast API?
The Historical Forecast API provides seamless access to a vast repository of high-resolution weather model archives. It includes the same weather models used for forecasts but continuously archives all available data, including essential variables like solar radiation, soil moisture, and atmospheric wind data, which are crucial for agricultural and energy applications.
While it functions similarly to the regular Forecast API, the Historical Forecast API is supported by large servers with large storage capacity to retain data. The availability of data varies depending on the weather model. For example, American models like GFS and HRRR offer public archives, allowing us to provide data from 2021 onwards. The data collection is ongoing, but due to the immense volume, this process will take several months more.
As a result, the Historical Forecast API provides incredibly fast access to long time-series weather data for the most important weather forecast models. In an instant, the graph below can be generated, displaying the last six months of data from all major weather forecast models.
This simplicity makes it easy to train machine learning models on historical data and seamlessly combine them with real-time weather forecasts to generate highly optimized forecasts for any specific use case requiring the utmost accuracy. Previously, you would have needed to download and process hundreds of terabytes of GRIB files, which can be very cumbersome to handle.
How is This Data Generated?
Weather models update every 1, 3, or 6 hours. Each "weather model run" is initialized with data from weather stations, satellites, radar, airplanes, soundings, and buoys.
Given that only a small fraction of the Earth is covered by quality weather stations, weather models simulate physical processes to fill in the gaps. Once the global atmospheric state is assimilated, these models predict the next steps in weather development.
Open-Meteo combines all available weather model updates into a continuous time series by taking the initial hours of each update. For instance, if a weather model updates every 3 hours, the first 3 hours of each update are concatenated into a seamless time series.
The initial time step, "Hour 0," is directly initialized with measurements, while hours 1 and 2 are generated using weather model simulations. Since weather models are highly accurate for the first few hours, the data remains reliable.
In North America and Europe, weather models offer updates every hour. Because "Hour 0" is based on measurements, the data is nearly identical to actual measurements, ensuring high accuracy.
This process results in high-quality weather data with resolutions up to 1 kilometer and global availability. The only drawback is that the data covers only the most recent years.
Difference from the Historical Weather API Based on Reanalysis Data
Experienced users know that Open-Meteo already provides historical weather data through its "Historical Weather API." For novice users expecting a single, definitive source of weather data, this can be confusing.
The "Historical Weather API" is based on reanalysis weather models, particularly ERA5. It offers data from 1940 onwards with consistent quality throughout the time series, making it ideal for analyzing long-term weather trends and climate change. This API focuses on consistency over pinpoint accuracy, with a spatial resolution ranging from 9 to 25 kilometers.
In contrast, the new "Historical Forecast API" is built by continuously assembling high-resolution weather forecast models. With higher update frequencies and increased resolution, this data can be more accurate than reanalysis models. However, it only covers the past 2-5 years and lacks long-term consistency due to the evolving nature of weather models.
Choosing the Right Dataset: Examples:
For analyzing weather trends or climate change over decades: Use the Historical Weather API, which provides reanalysis data from 1940 onwards.
For higher accuracy over the past few years: Opt for the Historical Forecast API, which offers high-resolution forecasts.
For optimizing weather forecasts using machine learning: Utilize data from the same high-resolution weather models available through the new Historical Forecast API.
As Open-Meteo continues to expand its range of weather datasets, maintaining a clear overview can be challenging. If you need assistance navigating this extensive data, don't hesitate to reach out for support.
How is This Data Stored?
Archiving all weather models is a significant challenge. Ensuring correct data downloads and managing immense storage requirements is crucial. The raw data required for this dataset easily exceeds 100 terabytes. This was achievable because Open-Meteo has been diligently storing as much data as possible over the past few years.
To make the data efficiently accessible to the public, Open-Meteo uses a specialized file format and a custom compression algorithm tailored for weather data. This approach surpasses the capabilities of typical databases, offering lightning-fast access times while reducing data size to one-fifth of the original raw data.
The development of the new Historical Forecast API began in November 2022. Initially, data was stored on a dedicated backup server with large but slow hard drives. Over time, more data was added, and the Open-Meteo file format proved essential for managing this volume. Currently, 33 terabytes of highly compressed weather data are available. While this amount of data might not break records, it is a different story when running a fast weather API that offers free access for non-commercial use.
To make the archives accessible, all data has been moved to large storage servers provided by the German hosting provider Hetzner. By combining large, slow HDDs with an SSD-based cache, popular data can be served quickly. Under ideal conditions, accessing two years of weather data takes less than 10 milliseconds. However, if data needs to be read from slow spinning disks, there might be a wait of one or two seconds.
In any case, you can now access years of high-resolution weather data faster than ever.
What’s Still to Come?
The Historical Forecast API is an ongoing project. Every day, new data is added, and we are actively working to integrate more data from existing models. In the future, additional weather models will be included in the Open-Meteo APIs, ideally providing archived data from past years.
We have many exciting ideas for further development:
Combining Reanalysis and High-Resolution Models: Seamlessly integrating reanalysis data with historical high-resolution models, though this requires extensive error corrections.
Integration of Satellite and Radar Datasets: Using these datasets as better ground truth data for more accurate results.
Area Analysis: Adding features to analyze larger areas in addition to single points.
Bulk Data Export: Making it easier to correlate lists of coordinates and timestamps with weather data.
If you have any specific features or datasets you need, feel free to reach out!
Stay updated by subscribing to our latest news.
Hi, great work! Keep it up!
I am working on a machine learning forecast model. Do we have the "forecast length" feature for the historical weather API? I need the previous 2-3 years of historical forecasts with a forecast length of at least 7 days for the Australian region.
Hi!
Thanks for creating and improving this great API!
I'm currently training a model that uses forecast weather variables as input. So my question is, are the historical forecast data downloaded from this API raw or have they been modified/improved through some process?
For example, is the forecast data provided for July 2 the output produced on July 1? I'm using the NOAA GFS weather model.
Basically what I need to know is if the data that I'm going to use for the model in production (let's say the next 24 hour forecast data from now) will have the same accuracy as the training data that this historical forecast API provides.