I noticed that the forecast API has a "past days" filter which would also allow us to retrieve historical data from past days, weeks etc.. is the historical data from forecast API the same as the one from historical weather API?
Or the historical data from forecast API is the historical forecast data, not the actual?
great question. No, it is not the same data. Forecast data is using high resolution local weather models with 1-5 km resolution (depending on the area). The historical weather API is using 10 km for surface variables like temperature and humidity, but only 25 km for atmospheric variables like clouds or precipitation. Due to the courser resolution you can notice difference in temperatures in urban, mountainous or coastal regions.
Those forecast models receive regular updates and improvements that "change" the behaviour of the model which might lead to statistical errors of you analyse a longer time-series of assembled forecast model data. The ERA5 weather ensures guarantees that no such changes occur.
Depending on you use case, one approach works better for you than another, If you want to study change over a long period of time, use the historical weather API. If you want to train a machine-learning model that adapts the forecast to recent measurements, use the Forecast API with "past days".
In any case, the data is the same as actual measurements, but assimilated from measurements. In locations with plenty of measurements, it is comparable to an spatial average of all measurements in an larger area. For locations without any measurements, data is using weather models to "fill in the gaps".
Thanks for the detailed explanation, it's really helpful! I eyeballed a few data points on the preview graph using Forecast API with filter past_days = a week, and noticed that some cases would have precipitation_sum (inch) smaller than rain_sum (inch). Based on the documentation, isn't the precipitation being the sum of rain, showers and snow? Am I missing something here? Need your help to understand this.
The historical weather data available on the API is a combination of both recorded and forecasted data. The data is obtained from various sources such as weather stations, aircraft, buoys, satellite, and radar observations. This information is then combined with numerical weather models using a technique called weather reanalysis, which helps to provide a more complete picture of past weather conditions. However, it is essential to note that there may be slight discrepancies compared to measurements from a single weather station due to the 10-kilometer resolution, local effects of the weather station, and potential inaccuracies in the numerical modeling of data. Despite these limitations, weather reanalysis offers more consistent data over multiple years and global coverage for any location on earth, regardless of whether a measurement station is available.
No, I am not using this old approach. For archives, compression is important. Otherwise storage requirements are way too high.
In the past couple of weeks, I started to look for fast and efficient compression algorithms like zstd, brotli or lz4. All of them, performed rather poor with time-series weather data.
After a lot of trial and error, I found a couple of pre-processing steps, that improve compression ratio a lot:
1) Scaling data to reasonable values. Temperature has an accuracy of 0.1° at best. I simply round everything to 0.05 instead of keeping the highest possible floating point precision.
2) A temperature time-series increases and decreases by small values. 0.4° warmer, then 0.2° colder. Only storing deltas improves compression performance.
3) Data are highly spatially correlated. If the temperature is rising in one "grid-cell", it is rising in the neighbouring grid cells as well. Simply subtract the time-series from one grid-cell to the next grid-cell. Especially this yielded a large boost.
4) Although zstd performs quite well with this encoded data, other integer compression algorithms have far better compression and decompression speeds. Namely I am using FastPFor.
With that compression approach, an archive became possible. One week of weather forecast data should be around 10 GB compressed. With that, I can easily maintain a very long archive.
Hi,
I noticed that the forecast API has a "past days" filter which would also allow us to retrieve historical data from past days, weeks etc.. is the historical data from forecast API the same as the one from historical weather API?
Or the historical data from forecast API is the historical forecast data, not the actual?
THANKSS
Hi Bella,
great question. No, it is not the same data. Forecast data is using high resolution local weather models with 1-5 km resolution (depending on the area). The historical weather API is using 10 km for surface variables like temperature and humidity, but only 25 km for atmospheric variables like clouds or precipitation. Due to the courser resolution you can notice difference in temperatures in urban, mountainous or coastal regions.
Those forecast models receive regular updates and improvements that "change" the behaviour of the model which might lead to statistical errors of you analyse a longer time-series of assembled forecast model data. The ERA5 weather ensures guarantees that no such changes occur.
Depending on you use case, one approach works better for you than another, If you want to study change over a long period of time, use the historical weather API. If you want to train a machine-learning model that adapts the forecast to recent measurements, use the Forecast API with "past days".
In any case, the data is the same as actual measurements, but assimilated from measurements. In locations with plenty of measurements, it is comparable to an spatial average of all measurements in an larger area. For locations without any measurements, data is using weather models to "fill in the gaps".
Hi,
Thanks for the detailed explanation, it's really helpful! I eyeballed a few data points on the preview graph using Forecast API with filter past_days = a week, and noticed that some cases would have precipitation_sum (inch) smaller than rain_sum (inch). Based on the documentation, isn't the precipitation being the sum of rain, showers and snow? Am I missing something here? Need your help to understand this.
THANKSSSS
Bella
There seems to be an inconsistency for locations in North America. I created a GitHub ticket to keep track of it: https://github.com/open-meteo/open-meteo/issues/347
Hi
Is the historical data weather forecasts or actual recorded weather?
Here you can find a video explaining how weather reanalysis works: https://www.youtube.com/watch?v=FAGobvUGl24
The historical weather data available on the API is a combination of both recorded and forecasted data. The data is obtained from various sources such as weather stations, aircraft, buoys, satellite, and radar observations. This information is then combined with numerical weather models using a technique called weather reanalysis, which helps to provide a more complete picture of past weather conditions. However, it is essential to note that there may be slight discrepancies compared to measurements from a single weather station due to the 10-kilometer resolution, local effects of the weather station, and potential inaccuracies in the numerical modeling of data. Despite these limitations, weather reanalysis offers more consistent data over multiple years and global coverage for any location on earth, regardless of whether a measurement station is available.
Does the API include what the historical record temp was on any given day?
Not yet. I only started archiving data a few months back.
In the next weeks, there will be an additional historical API available with 60 years of past weather data
Do you store the historical data also in ring data-structures like you describe in https://openmeteo.substack.com/p/how-to-store-weather-forecast-data ?
Hi Sam, thanks for asking!
No, I am not using this old approach. For archives, compression is important. Otherwise storage requirements are way too high.
In the past couple of weeks, I started to look for fast and efficient compression algorithms like zstd, brotli or lz4. All of them, performed rather poor with time-series weather data.
After a lot of trial and error, I found a couple of pre-processing steps, that improve compression ratio a lot:
1) Scaling data to reasonable values. Temperature has an accuracy of 0.1° at best. I simply round everything to 0.05 instead of keeping the highest possible floating point precision.
2) A temperature time-series increases and decreases by small values. 0.4° warmer, then 0.2° colder. Only storing deltas improves compression performance.
3) Data are highly spatially correlated. If the temperature is rising in one "grid-cell", it is rising in the neighbouring grid cells as well. Simply subtract the time-series from one grid-cell to the next grid-cell. Especially this yielded a large boost.
4) Although zstd performs quite well with this encoded data, other integer compression algorithms have far better compression and decompression speeds. Namely I am using FastPFor.
With that compression approach, an archive became possible. One week of weather forecast data should be around 10 GB compressed. With that, I can easily maintain a very long archive.