OpenStreetMap Data Quality

OpenStreetMap, the Wikipedia of Maps, is a collaboratively collected open geographical database of the world. It can be used to produce maps, to generate routing information or a variety of other use cases. As with any data set, one is interested if the quality of the data is sufficient for any of these use cases.

Data quality is a term with many facets. One can distinguish different types of data quality: from data completeness, over consistency to (positional, temporal and thematic) accuracy. These aspects can to be looked at independently. Additionally, data quality is also generally not homogeneous (i.e. it can differ across different locations), so one needs to evaluate it individually for each place.

The main goal of the ohsome data analysis platform is to enable evaluation of the fitness for purpose of OSM data on a global scale for a variety of applications by leveraging intrinsic data quality measures.

OSM History Data

Many of the proposed intrinsic data quality measures require the evaluation of OSM history data, meaning the evolution of the always changing OpenStreetMap database. For this purpose, the OSM ecosystem provides so called full history data sets, which contain an ever growing set of data including all changes (being it additions, deletions or modifications of objects) that are made by thousands of contributors every day.

Because of the large amounts of OSM history data, previous work has only looked at limited subsets of this data, being it small regions in space or a small number of fixed timestamps and feature types. The ohsome platform improves these data analysis methods by applying big data technology on top of OSM history data.

The major challenges when analyzing OSM's history data lie not only in the relatively large amount of raw data which has to be handled, but also in the large span of scales at which different objects are mapped in OSM (for example ranging from single trees up to mountain ranges), as well as OSM's evolving object taxonomy.

Data Analysis Platform

In order to allow as many different analysis methods to be applied on top of OSM's history data as possible, we chose design the data analysis platform in a way that puts in as few limits as possible. We achieve this by using a data base format that's very close to OSM's own data format(s), but optimized in a way to allow fast parallel processing and pre-processed in a way that makes some very frequently used operations much more efficient. This approach also doesn't discard any of the information that's present in OSM's original history data set, for example allowing analyses to access all of the available meta data fields.

Besides the above mentioned focus on intrinsic data quality metrics, further ohsome applications include topics like exploratory data analysis (such as visualizing OSM contributor activity and the examination of individual OSM objects), the analysis of OSM contribution patterns, or general geo-statistics.

Ohsome Examples

This website showcases a few selected examples that were created based on the history data from OpenStreetMap, highlighting what insights are possible to achieve using our data analysis platform.