The ohsome-planet tool can be used to:
- Transform OSM (history) PBF files into Parquet format with native GEO support.
- Turn an OSM changeset file (
osm.bz2) into a PostgreSQL database table. - Keep both datasets up-to-date by ingesting OSM planet replication files.
ohsome-planet creates the actual OSM elements geometries for nodes, ways and relations. It enriches each element with changeset data such as hastags, OSM editor or username. Additionally it is possibly to add country ISO codes to each element by providing a boundary dataset.
The output of ohsome-planet can be used to perform a wide range of geospatial analyses with tools such as DuckDB, Python GeoPandas or QGIS. Its also possible to display the data directly on a map and explore it.
Installation requires Java 21.
First, clone the repository and its submodules. Then, build it with Maven.
git clone --recurse-submodules https://github.com/GIScience/ohsome-planet.git
cd ohsome-planet
./mvnw clean package -DskipTestsTo see the help page of the ohsome-planet CLI run:
java -jar ohsome-planet-cli/target/ohsome-planet.jar --helpIn this tutorial we will run the three main modes of ohsome-planet:
- Contributions: OSM Extract (
.pbf) --> Parquet - Changesets: OSM Changesets (
.bz2) --> PostgreSQL - Replication: OSM Replication Files (
.osc) --> Parquet / PostgreSQL
Transform OSM (history/latest)
.pbffile into Parquet format.
You can download the latest or history OSM extract (osm.pbf) for the whole planet from the OSM Planet server or for small regions from Geofabrik.
Throughout this tutorial we are going to use a small extract of OSM for Karlsruhe from Geofabrik (karlsruhe-regbez-latest.osm.pbf). Karlsruhe is a city in Germany.
To process any given .pbf file, you need to run ohsome-planet with the contributions command and at least the --pbf and data (data output directory) arguments:
java -jar ohsome-planet-cli/target/ohsome-planet.jar \
contributions \
--data data/ \
--pbf karlsruhe-regbez-latest.osm.pbfAdditional arguments like --parallel, --country-file, --changeset-db and --overwrite are optional. Find out more about these on the documentation site of the CLI or the help text of the CLI:
java -jar ohsome-planet-cli/target/ohsome-planet.jar \
contributions \
--helpWhen using a history PBF file, the output files are split into history and latest contributions.
All contributions which are a) not deleted and b) visible in OSM at the timestamp of the extract are considered as latest.
The remaining contributions (deleted or old versions) are considered as history.
The number of threads (--parallel parameter) defines the number of files which will be created.
To see the files created and the directory structure run:
tree data/contributions
data/contributions/
├── history
│ ├── relation-0-history-contribs.parquet
│ └── ...
└── latest
├── node-0-latest-contribs.parquet
├── relation-0-latest-contribs.parquet
├── way-0-latest-contribs.parquet
└── ...To explore the data with DuckDB run:
duckdb -s "DESCRIBE FROM read_parquet('data/contributions/*/*.parquet');"
┌───────────────────┬─────────────────────────────────────────────────────────┐
│ column_name │ column_type │
│ varchar │ varchar │
├───────────────────┼─────────────────────────────────────────────────────────┤
│ status │ VARCHAR │
│ valid_from │ TIMESTAMP WITH TIME ZONE │
│ valid_to │ TIMESTAMP WITH TIME ZONE │
│ osm_type │ VARCHAR │
│ osm_id │ BIGINT │
│ osm_version │ INTEGER │
│ osm_minor_version │ INTEGER │
│ osm_edits │ INTEGER │
│ osm_last_edit │ TIMESTAMP WITH TIME ZONE │
│ user │ STRUCT(id INTEGER, "name" VARCHAR) │
│ tags │ MAP(VARCHAR, VARCHAR) │
│ tags_before │ MAP(VARCHAR, VARCHAR) │
│ changeset │ STRUCT(id BIGINT, created_at TIMESTAMP WITH TIME ZONE… │
│ bbox │ STRUCT(xmin DOUBLE, ymin DOUBLE, xmax DOUBLE, ymax DO… │
│ centroid │ STRUCT(x DOUBLE, y DOUBLE) │
│ xzcode │ STRUCT("level" INTEGER, code BIGINT) │
│ geometry_type │ VARCHAR │
│ geometry │ BLOB │
│ area │ DOUBLE │
│ area_delta │ DOUBLE │
│ length │ DOUBLE │
│ length_delta │ DOUBLE │
│ contrib_type │ VARCHAR │
│ refs_count │ INTEGER │
│ refs │ BIGINT[] │
│ members_count │ INTEGER │
│ members │ STRUCT("type" VARCHAR, id BIGINT, "timestamp" TIMESTA… │
│ countries │ VARCHAR[] │
│ build_time │ BIGINT │
├───────────────────┴─────────────────────────────────────────────────────────┤
│ 29 rows │To explore the data with QGIS run:
qgis data/contributions/latest/Import OSM changesets
.bz2file to PostgreSQL.
First, create an empty PostgreSQL database with PostGIS extension or provide a connection to an existing database. For instance, you can set it up like this.
export OHSOME_PLANET_DB_USER=ohsomedb
export OHSOME_PLANET_DB_PASSWORD=mysecretpassword
docker run -d \
--name ohsome_planet_changeset_db \
-e POSTGRES_PASSWORD=$OHSOME_PLANET_DB_PASSWORD \
-e POSTGRES_USER=$OHSOME_PLANET_DB_USER \
-p 5432:5432 \
postgis/postgisSecond, download the full changeset file from the OSM Planet server. If you want to clip the extent to a smaller region, you can use the changeset-filter command of the osmium library. This might take a few minutes. Currently, there is no provider for pre-processed or regional changeset file extracts.
osmium changeset-filter \
--bbox=8.319,48.962,8.475,49.037 \
--output=changesets-latest-karlsruhe-regbez.osm.bz2 \
changesets-latest.osm.bz2Then, process the OSM changesets .bz2 file like in the following example.
java -jar ohsome-planet-cli/target/ohsome-planet.jar \
changesets \
--bz2 changesets-latest-karlsruhe-regbez.osm.bz2 \
--changeset-db "jdbc:postgresql://localhost:5432/postgres?user=$OHSOME_PLANET_DB_USER&password=$OHSOME_PLANET_DB_PASSWORD" \
--create-tables \
--overwriteThe parameters --create-tables and --overwrite are optional. Find more detailed information on usage here: docs/CLI.md. To see all available parameters, call the tool with --help parameter.
Transform OSM replication .osc files into parquet format.
Keep changeset PostgreSQL database up-to-date.
The ohsome-planet tool can also be used to generate updates from the replication files provided by the OSM Planet server. Geofabrik also provides updates for regional extracts.
If you want to update both datasets your command should look like this:
java -jar ohsome-planet-cli/target/ohsome-planet.jar replications \
--data path/to/data \
--changeset-db "jdbc:postgresql://localhost:5432/postgres?user=your_user&password=your_password" \
--parallel 8 \
--country-file data/world.csv \
--parquet-data path/to/parquet/output/ \
--continueJust like for the contributions command you can use the optional parameters --parallel, --country-file, --parquet-data arguments here as well.
The optional --continue flag can be used to make the update process run as a continuous service, which will wait and fetch new changes from the OSM planet server.
If you want to only update changesets you can use the --just-changesets flag. You can do the same for contributions with --just-contributions.
Find more detailed information on usage here: docs/CLI.md. To see all available parameters, call the tool with --help parameter.
Contributions will be written as Parquet files matching those found in the replication source.
This mimics the structure of the OSM Planet Server.
You can use the top level state files (state.txt or state.csv) to find the most recent sequence number.
/data/ohsome-planet/berlin
└── updates
├── 006
│ ├── 942
│ │ ├── 650.opc.parquet
│ │ ├── 650.state.txt
│ │ ├── ...
│ │ ├── 001.opc.parquet
│ │ └── 001.state.txt
│ ├── 941
│ ├── ...
│ └── 001
├── state.csv
└── state.txt
You can inspect your results easily using DuckDB. Take a look at our collection of useful queries to find many analysis examples.
-- list all columns
DESCRIBE FROM read_parquet('contributions/*/*.parquet');
-- result
┌───────────────────┬────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┬─────────┬─────────┬─────────┬─────────┐
│ column_name │ column_type │ null │ key │ default │ extra │
│ varchar │ varchar │ varchar │ varchar │ varchar │ varchar │
├───────────────────┼────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┼─────────┼─────────┼─────────┼─────────┤
│ status │ VARCHAR │ YES │ NULL │ NULL │ NULL │
│ valid_from │ TIMESTAMP WITH TIME ZONE │ YES │ NULL │ NULL │ NULL │
│ valid_to │ TIMESTAMP WITH TIME ZONE │ YES │ NULL │ NULL │ NULL │
│ osm_type │ VARCHAR │ YES │ NULL │ NULL │ NULL │
│ osm_id │ BIGINT │ YES │ NULL │ NULL │ NULL │
│ osm_version │ INTEGER │ YES │ NULL │ NULL │ NULL │
│ osm_minor_version │ INTEGER │ YES │ NULL │ NULL │ NULL │
│ osm_edits │ INTEGER │ YES │ NULL │ NULL │ NULL │
│ osm_last_edit │ TIMESTAMP WITH TIME ZONE │ YES │ NULL │ NULL │ NULL │
│ user │ STRUCT(id INTEGER, "name" VARCHAR) │ YES │ NULL │ NULL │ NULL │
│ tags │ MAP(VARCHAR, VARCHAR) │ YES │ NULL │ NULL │ NULL │
│ tags_before │ MAP(VARCHAR, VARCHAR) │ YES │ NULL │ NULL │ NULL │
│ changeset │ STRUCT(id BIGINT, created_at TIMESTAMP WITH TIME ZONE, closed_at TIMESTAMP WITH TIME ZONE, tags MAP(VARCHAR, VARCHAR), hashtags VARCHAR[], editor VARCHAR, numChanges INTEGER) │ YES │ NULL │ NULL │ NULL │
│ bbox │ STRUCT(xmin DOUBLE, ymin DOUBLE, xmax DOUBLE, ymax DOUBLE) │ YES │ NULL │ NULL │ NULL │
│ centroid │ STRUCT(x DOUBLE, y DOUBLE) │ YES │ NULL │ NULL │ NULL │
│ xzcode │ STRUCT("level" INTEGER, code BIGINT) │ YES │ NULL │ NULL │ NULL │
│ geometry_type │ VARCHAR │ YES │ NULL │ NULL │ NULL │
│ geometry │ GEOMETRY │ YES │ NULL │ NULL │ NULL │
│ area │ DOUBLE │ YES │ NULL │ NULL │ NULL │
│ area_delta │ DOUBLE │ YES │ NULL │ NULL │ NULL │
│ length │ DOUBLE │ YES │ NULL │ NULL │ NULL │
│ length_delta │ DOUBLE │ YES │ NULL │ NULL │ NULL │
│ contrib_type │ VARCHAR │ YES │ NULL │ NULL │ NULL │
│ refs_count │ INTEGER │ YES │ NULL │ NULL │ NULL │
│ refs │ BIGINT[] │ YES │ NULL │ NULL │ NULL │
│ members_count │ INTEGER │ YES │ NULL │ NULL │ NULL │
│ members │ STRUCT("type" VARCHAR, id BIGINT, "timestamp" TIMESTAMP WITH TIME ZONE, "role" VARCHAR, geometry_type VARCHAR, geometry BLOB)[] │ YES │ NULL │ NULL │ NULL │
│ countries │ VARCHAR[] │ YES │ NULL │ NULL │ NULL │
│ build_time │ BIGINT │ YES │ NULL │ NULL │ NULL │
├───────────────────┴────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┴─────────┴─────────┴─────────┴─────────┤
│ 29 rows 6 columns │
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘This is a list of resources that you might want to take a look at to get a better understanding of the core concepts used for this project. In general, you should gain some understanding of the raw OSM (history) data format and know how to build geometries from nodes, ways and relations. Furthermore, knowledge about (Geo)Parquet files is useful as well.
What is the OSM PBF File Format?
- https://wiki.openstreetmap.org/wiki/PBF_Format
- History PBF files for smaller regions: Geofabrik
- History or latest PBF files for the whole planet: OSM Planet
What is parquet?
- https://parquet.apache.org/docs/file-format/
- https://github.com/apache/parquet-java
- https://github.com/apache/parquet-format
What is RocksDB?
- RocksDB is a storage engine with key/value interface, where keys and values are arbitrary byte streams. It is a C++ library. It was developed at Facebook based on LevelDB and provides backwards-compatible support for LevelDB APIs.
- https://github.com/facebook/rocksdb/wiki
How to build OSM geometries (for multipolygons)?
- https://wiki.openstreetmap.org/wiki/Relation:multipolygon#Examples_in_XML
- https://osmcode.org/osm-testdata/
- https://github.com/GIScience/oshdb/blob/a196cc990a75fa35841ca0908f323c3c9fc06b9a/oshdb-util/src/main/java/org/heigit/ohsome/oshdb/util/geometry/OSHDBGeometryBuilderInternal.java#L469
- For relations that consist of more than 500 members we skip
MultiPolygongeometry building and fall back toGeometryCollection. CheckMEMBERS_THRESHOLDinohsome-contributions/src/main/java/org/heigit/ohsome/contributions/contrib/ContributionGeometry.java. - For contributions with status
deletedwe use the geometry of the previous version. This allows you to spatially filter also for deleted elements, e.g. by bounding box. In the sense of OSM deleted elements do not have any geometry.