First of all, impressive work! Also enjoying the blog posts, I'm learning a lot about weather and forecasting 👌 Thinking about using this in my next project
I'm still curious about how you serve the data. Specifically:
- about caching, I've been looking at the code base and haven't found anything specific about this. When you talk about caching, you mean the API uses the filesystem caching under the hood, right? Did you have to make any tweaks to improve performance?
- about the db itself, I've read it is a custom filesystem based data base. If I didn't understood bad, data is stored as compressed plain binary and, when a request comes in, it gets decompressed, and formated. Lets say I want to fetch data in a specific time range or apply some filters, how are those kind of operations performed?
Glad you found this article already. Yes, caching is only done using the linux page cache. No high-level caching like redis, memcache, etc is used. Because the API is so fast and data is constantly updated, this kind of caching does not improve performance and requires a lot of resources. The entire Open-Meteo stack is kept very simple and does not rely on any external databases or synchronisation systems.
Recently I need historical weather data of USA cities. I started checking for few cities and I see a huge difference between the data I get from this API and other websites such as accuweather, world-weather etc.
For example these are the results for Houston (29.7604, 95.3698) on 01.03.2023.
Meteo result: Max Apparent Temp.: -5.3°C Mean Apparent Temp: -9.5°C
Accuweather: Max Temp: 28°C (during the day)
World-weather: Max Temp: 22°C (during the day)
I checked for few different cities and I got similar huge differences between temperatures.
Am I doing something wrong? Is such a gap between results normal?
Thanks for your quick reply! I'm trying to obtain the data from 1 June 2023 till today (or 31 December 2023). Setting the date in the start date field to 2023-01-06 leads automatically to the same date in the field of the end data and vice versa.
Thanks very much for this valuable tool(s)! Unfortunately recently I had difficulties with downloading historical data. It was nog possible to set the data. You you know of this problem and/or do you have a solution?
Hi, I am unaware of any issues. Can you further describe the problem? Please note: The free API will impose rate limiting if you are trying download data in bulk.
Hello, could you please explain where the data for "weathercode" in the Historical API originates? I searched and I couldn't find it in ERA5 or CERRA databases...
Hello, I have a question. Could you add a separate field for weekly hourly forecast to specify if a certain hour is daytime or nighttime. Right now only the daily forecast has data about sunrises and sunsets to determine if a certain hour is day or night time. This forces my to use two queries if only hourly time is needed. There must be a 'day' or 'night' label for each hour.
As a quick solution you can use "shortwave_radiation > 0" to differentiate between day and night time. Ideally, I could add "sunshine_time" in minutes which also helps to calculate the daily sunshine hours.
If I understand correctly, it is possible to define daytime as the moment where the shortwave radiation is greater than zero. This would probably include daytime and the beginning of dusk (if sunset has recently occurred or dawn is coming soon). This is a good solution, but I don't see how this parameter accurately determines daytime in overcast conditions, strong light from the Moon or "white nights" (when the Sun does not go deep below the horizon) observed in northern latitudes. Does this parameter only give direct radiation from the Sun? In my opinion, determination by sunset and sunrise times would be more correct. I hope you can enter this data into the JSON hourly query result. Anyway, Thank you very much [=vielen Dank, =Большое спасибо] for your feedback and hint.
The weather variable "shortwave_radiation" will show values larger than 0 as soon as the sun is above the horizon for a given hour. Even with dense clouds, more than 100 W/m2 are expected. Scattered radiation from the moon is not considered. At night, shortwave radiation is always 0. It works reliably as an indicator for daylight. Please also note that hourly shortwave radiation is defined as a backwards average. E.g. the value at 19:00 show the average radiation from 18:00-19:00. For sunsets at 18:15, the value at 19:00 would be greater than 0.
Yes, using daily sunrise/set information is more precise.
That's very good. So I will make a definition in my appendix for daytime and nighttime based on the shortwave radiation of the Sun, and the difference of one hour when the average for the period is taken can be stated as twilight. In fact, night and day have no sharp boundary, so this solution is quite satisfactory. Plus, it will save me from having to do another query in the daily forecast just to get sunrise and sunset. Hopefully, in the future, there will be a 'day_or_night_time' field in the hourly JSON.
PS. I've tried various other weather API sites before, and I can say that the accuracy and completeness of the information on your API is the best. At least for my observation point (Russia, Krasnodar region). In the future, in addition to specifying day and night time, it would be desirable to also introduce descriptions of weather codes in different languages. This is just as a wish.
Ich möchte Ihnen noch einmal danken und wünsche Ihrem Projekt eine erfolgreiche Entwicklung.
I was wondering: when I use the historical weather API, generate a url for my location and check the output, I notice that the lat and lon are rounded of to the nearest 0.5 degrees, while I would expect the numbers to be rounded of to the nearest 0.25 degrees (43.28, -3.31 becomes 43.0, -3.5 instead of 43.25,-3.25). Do you know the reason or I am making a mistake in my thinking?
Your coordinate points to mountain (800 m elevation) next to the coast of Spain. The closest coordinate 43.25,-3.25 is in a valley at 200 m elevation . Although slightly further away, 43.0, -3.5 is on 700 m elevation and matches better. Afterwards, downscaling to 90 meters is used to correct for typical temperature biases. Selecting a coordinate that matches terrain, improves data accuracy in many cases.
However, at 25 km resolution terrain matching can only slightly improve accuracy. I am just about to release a 11 km version of the historical weather API. The update is currently rolling out to all servers and will be announced in the next hours on this blog!
Your location will significantly benefit from the 11 km update. Additionally there will be 5 km resolution available for the period of 1985 until June 2021. Realtime updates for 5 the 5-km resolution will follow sometime in 2023.
Thank you so much for your reply, it makes sense to select a point that resembles the users lat/lon better in terms of elevation. Does this mean that you first check for the elevation based on the 90m DTM, compare it to the avg elevation of the meteo grid cell and choose a more suitable meteo grid cell if needed (within a reasonable distance)?
And awesome that you're using the ERA5-Land dataset now with the 11 km resolution. Are you also expecting to increase the resolution for you bias correction in mountainous regions? To 30m for example? Or do you think that this is not much of an improvement?
Yes, I first check the digital elevation model (DEM) for the given coordinate. With the resolved elevation, I then processed in resolving a suitable grid cell.
A 30 meter DEM is available from Copernicus, but it has some missing. I also do not think that there is any improvement. A potential improvement would also be smaller than typical errors in gridded weather datasets.
Should I be able to get weather data for any lat lng points, or must they be within 5km of a weather station? I'm having trouble getting data for several Greek islands.
Hi, Alkis! Yes, for any location on land data is available in the historical weather API. Data is based on a weather reanalysis at 25 kilometre grid resolution. In creating a reanalysis, weather station data has been used to validate and correct the historical data, but for places without a weather station, a physical weather model fills in the gaps.
Although data should be available everywhere, I removed data on sea and in the far north, to save a lot of disk storage. I assume that due to the 25 kilometre resolution, data for the greek island have been removed as well.
I am planning to purchase additional servers for Open-Meteo with enough disk storage to make all data available. This will take a couple of months, but should then solve your issue.
Hi, thanks for the article. But I have some questions about getting the data. First is I got a lot of NaN values when I choose era-5-land , is it normal? Second , how can I calculate temperature at the exactly point between two stations?
For which coordinates do you get NaN values? Currently, data for any location on sea and far north is not included in the database. The amount of data would be too large otherwise. I might upgrade to a large server for the historical archive API at some point.
ERA5 is a gridded dataset and you do not get station data directly. The indicated temperature value is value for the entire grid-cell. Although you could interpolate between multiple grid-cells, this is not improving accuracy.
For Europe, I will include the 5-km CERRA reanalysis as well, that should improve accuracy as this type of downscaling is physically bound.
I got data from coopernicus in grib format and using xarray tried to see data. I interested only data covered Czech Republic as I wrote to you a month before. I am looking for solution to find temperature and mm rain and snow covers exactly at the point for my research .
As you are downloading data directly from Copernicus I cannot tell where the error is, but there should be plenty of Python/R resources available in the Copernicus user support forums
Hello, the workstation is self build PC. Nothing in particular. I just needed a lot of storage :D
The servers for the historical weather API are hosted in Germany. They fall under GDPR. As I am not using cookies, analytics, trackers, beacons or ads, I am on the privacy centric side anyway.
If it gets more popular, I can easily plug-in more servers.
First of all, impressive work! Also enjoying the blog posts, I'm learning a lot about weather and forecasting 👌 Thinking about using this in my next project
I'm still curious about how you serve the data. Specifically:
- about caching, I've been looking at the code base and haven't found anything specific about this. When you talk about caching, you mean the API uses the filesystem caching under the hood, right? Did you have to make any tweaks to improve performance?
- about the db itself, I've read it is a custom filesystem based data base. If I didn't understood bad, data is stored as compressed plain binary and, when a request comes in, it gets decompressed, and formated. Lets say I want to fetch data in a specific time range or apply some filters, how are those kind of operations performed?
Thanks for your time in advance!
I've been reading this post and all my doubts were explained there: https://openmeteo.substack.com/p/how-to-store-weather-forecast-data
thanks again!
Glad you found this article already. Yes, caching is only done using the linux page cache. No high-level caching like redis, memcache, etc is used. Because the API is so fast and data is constantly updated, this kind of caching does not improve performance and requires a lot of resources. The entire Open-Meteo stack is kept very simple and does not rely on any external databases or synchronisation systems.
This is an amazing work, thanks for sharing 👏🏼
Recently I need historical weather data of USA cities. I started checking for few cities and I see a huge difference between the data I get from this API and other websites such as accuweather, world-weather etc.
For example these are the results for Houston (29.7604, 95.3698) on 01.03.2023.
Meteo result: Max Apparent Temp.: -5.3°C Mean Apparent Temp: -9.5°C
Accuweather: Max Temp: 28°C (during the day)
World-weather: Max Temp: 22°C (during the day)
I checked for few different cities and I got similar huge differences between temperatures.
Am I doing something wrong? Is such a gap between results normal?
Could it be as simple as a missing minus sign for the longitude coordinate? (29.7604, -95.3698)
That was it! My silly mistake.
Thanks for quick response Patrick.
Hi there. I found the solution for my problem. If you delete the end date manually and fill in the desired end date it works well after all.
oeps I mean 2023-06-01 ;-)
Thanks for your quick reply! I'm trying to obtain the data from 1 June 2023 till today (or 31 December 2023). Setting the date in the start date field to 2023-01-06 leads automatically to the same date in the field of the end data and vice versa.
Thanks very much for this valuable tool(s)! Unfortunately recently I had difficulties with downloading historical data. It was nog possible to set the data. You you know of this problem and/or do you have a solution?
Hi, I am unaware of any issues. Can you further describe the problem? Please note: The free API will impose rate limiting if you are trying download data in bulk.
Hello, could you please explain where the data for "weathercode" in the Historical API originates? I searched and I couldn't find it in ERA5 or CERRA databases...
Hi, weather codes are calculated from clouds, precipitation and snowfall. It can't calculate thunderstorms as there is only limited information available. You can see the implementation here: https://github.com/open-meteo/open-meteo/blob/main/Sources/App/Helper/WeatherCode.swift
Hello, I have a question. Could you add a separate field for weekly hourly forecast to specify if a certain hour is daytime or nighttime. Right now only the daily forecast has data about sunrises and sunsets to determine if a certain hour is day or night time. This forces my to use two queries if only hourly time is needed. There must be a 'day' or 'night' label for each hour.
As a quick solution you can use "shortwave_radiation > 0" to differentiate between day and night time. Ideally, I could add "sunshine_time" in minutes which also helps to calculate the daily sunshine hours.
Hello Patrick!
If I understand correctly, it is possible to define daytime as the moment where the shortwave radiation is greater than zero. This would probably include daytime and the beginning of dusk (if sunset has recently occurred or dawn is coming soon). This is a good solution, but I don't see how this parameter accurately determines daytime in overcast conditions, strong light from the Moon or "white nights" (when the Sun does not go deep below the horizon) observed in northern latitudes. Does this parameter only give direct radiation from the Sun? In my opinion, determination by sunset and sunrise times would be more correct. I hope you can enter this data into the JSON hourly query result. Anyway, Thank you very much [=vielen Dank, =Большое спасибо] for your feedback and hint.
The weather variable "shortwave_radiation" will show values larger than 0 as soon as the sun is above the horizon for a given hour. Even with dense clouds, more than 100 W/m2 are expected. Scattered radiation from the moon is not considered. At night, shortwave radiation is always 0. It works reliably as an indicator for daylight. Please also note that hourly shortwave radiation is defined as a backwards average. E.g. the value at 19:00 show the average radiation from 18:00-19:00. For sunsets at 18:15, the value at 19:00 would be greater than 0.
Yes, using daily sunrise/set information is more precise.
That's very good. So I will make a definition in my appendix for daytime and nighttime based on the shortwave radiation of the Sun, and the difference of one hour when the average for the period is taken can be stated as twilight. In fact, night and day have no sharp boundary, so this solution is quite satisfactory. Plus, it will save me from having to do another query in the daily forecast just to get sunrise and sunset. Hopefully, in the future, there will be a 'day_or_night_time' field in the hourly JSON.
PS. I've tried various other weather API sites before, and I can say that the accuracy and completeness of the information on your API is the best. At least for my observation point (Russia, Krasnodar region). In the future, in addition to specifying day and night time, it would be desirable to also introduce descriptions of weather codes in different languages. This is just as a wish.
Ich möchte Ihnen noch einmal danken und wünsche Ihrem Projekt eine erfolgreiche Entwicklung.
Thanks for the impressive work.
I was wondering: when I use the historical weather API, generate a url for my location and check the output, I notice that the lat and lon are rounded of to the nearest 0.5 degrees, while I would expect the numbers to be rounded of to the nearest 0.25 degrees (43.28, -3.31 becomes 43.0, -3.5 instead of 43.25,-3.25). Do you know the reason or I am making a mistake in my thinking?
Cheers
Your coordinate points to mountain (800 m elevation) next to the coast of Spain. The closest coordinate 43.25,-3.25 is in a valley at 200 m elevation . Although slightly further away, 43.0, -3.5 is on 700 m elevation and matches better. Afterwards, downscaling to 90 meters is used to correct for typical temperature biases. Selecting a coordinate that matches terrain, improves data accuracy in many cases.
However, at 25 km resolution terrain matching can only slightly improve accuracy. I am just about to release a 11 km version of the historical weather API. The update is currently rolling out to all servers and will be announced in the next hours on this blog!
Your location will significantly benefit from the 11 km update. Additionally there will be 5 km resolution available for the period of 1985 until June 2021. Realtime updates for 5 the 5-km resolution will follow sometime in 2023.
Hi Patrick,
Thank you so much for your reply, it makes sense to select a point that resembles the users lat/lon better in terms of elevation. Does this mean that you first check for the elevation based on the 90m DTM, compare it to the avg elevation of the meteo grid cell and choose a more suitable meteo grid cell if needed (within a reasonable distance)?
And awesome that you're using the ERA5-Land dataset now with the 11 km resolution. Are you also expecting to increase the resolution for you bias correction in mountainous regions? To 30m for example? Or do you think that this is not much of an improvement?
Cheers
Yes, I first check the digital elevation model (DEM) for the given coordinate. With the resolved elevation, I then processed in resolving a suitable grid cell.
A 30 meter DEM is available from Copernicus, but it has some missing. I also do not think that there is any improvement. A potential improvement would also be smaller than typical errors in gridded weather datasets.
Thank you so much for this project! It's a gift to the world
Should I be able to get weather data for any lat lng points, or must they be within 5km of a weather station? I'm having trouble getting data for several Greek islands.
Hi, Alkis! Yes, for any location on land data is available in the historical weather API. Data is based on a weather reanalysis at 25 kilometre grid resolution. In creating a reanalysis, weather station data has been used to validate and correct the historical data, but for places without a weather station, a physical weather model fills in the gaps.
Although data should be available everywhere, I removed data on sea and in the far north, to save a lot of disk storage. I assume that due to the 25 kilometre resolution, data for the greek island have been removed as well.
I am planning to purchase additional servers for Open-Meteo with enough disk storage to make all data available. This will take a couple of months, but should then solve your issue.
How could I stay updated for when and if this changes?
Simply subscribe to the newsletter of this blog and you will get the latest updates via email!
Many thanks!
Data for Greek islands is now available! There will be an additional blog post next week regarding upgrades for the historical weather API.
Thank you, I appreciate the insight!
Hi, thanks for the article. But I have some questions about getting the data. First is I got a lot of NaN values when I choose era-5-land , is it normal? Second , how can I calculate temperature at the exactly point between two stations?
For which coordinates do you get NaN values? Currently, data for any location on sea and far north is not included in the database. The amount of data would be too large otherwise. I might upgrade to a large server for the historical archive API at some point.
ERA5 is a gridded dataset and you do not get station data directly. The indicated temperature value is value for the entire grid-cell. Although you could interpolate between multiple grid-cells, this is not improving accuracy.
For Europe, I will include the 5-km CERRA reanalysis as well, that should improve accuracy as this type of downscaling is physically bound.
I got data from coopernicus in grib format and using xarray tried to see data. I interested only data covered Czech Republic as I wrote to you a month before. I am looking for solution to find temperature and mm rain and snow covers exactly at the point for my research .
As you are downloading data directly from Copernicus I cannot tell where the error is, but there should be plenty of Python/R resources available in the Copernicus user support forums
Thanks, I wrote an email to you
https://postimg.cc/HrjMXpYT
This is pic , what I have(
hello is the workstation and where is the server with the data coming from?
Hello, the workstation is self build PC. Nothing in particular. I just needed a lot of storage :D
The servers for the historical weather API are hosted in Germany. They fall under GDPR. As I am not using cookies, analytics, trackers, beacons or ads, I am on the privacy centric side anyway.
If it gets more popular, I can easily plug-in more servers.