🦆 DUCKDB × GEOPARQUET · GUIDE PRATIQUE 2026

🦆 DUCKDB × GEOPARQUET · GUIDE PRATIQUE 2026

Tous les workflows SIG courants réécrits pour DuckDB Spatial + Overture Maps. Zéro serveur, zéro configuration — copie, colle, analyse.

20×
plus rapide que GeoPandas sur gros fichiers
3 lignes
pour remplacer un setup PostGIS complet
0€
aucun serveur, aucune licence
1
Installation & extensions géospatiales
Deux extensions suffisent pour 95% des workflows SIG
setup_duckdb.sql
-- Installation des extensions (une seule fois) INSTALL spatial; -- fonctions géospatiales + GDAL INSTALL httpfs; -- lecture S3/HTTPS distante -- Chargement à chaque session (ou dans .duckdbrc) LOAD spatial; LOAD httpfs; -- Vérification rapide SELECT ST_AsText(ST_Point(2.3522, 48.8566)) AS paris; -- → POINT (2.3522 48.8566)
💡
Astuce .duckdbrc : place LOAD spatial; LOAD httpfs; dans ton fichier ~/.duckdbrc pour que les extensions se chargent automatiquement à chaque session — tu n'auras plus jamais à y penser.
2
Ouvrir un fichier GeoParquet local ou distant
DuckDB reconnaît automatiquement le format GeoParquet
open_geoparquet.sql
-- Fichier local : drag & drop dans DuckDB SELECT * FROM read_parquet('mon_fichier.parquet') LIMIT 5; -- Fichier distant (HTTPS) SELECT * FROM read_parquet('https://example.com/data.parquet') LIMIT 5; -- Dossier entier de parquet (partitionné) SELECT * FROM read_parquet('data/*.parquet', hive_partitioning=true); -- Inspecter le schéma sans lire les données DESCRIBE SELECT * FROM read_parquet('mon_fichier.parquet');
3
Requêtes Overture Maps par bounding box
Extraction directe depuis S3 — zéro téléchargement complet
âš¡
Overture release actuelle : 2026-03-18.0. Toujours vérifier sur docs.overturemaps.org pour la dernière version avant d'exécuter.
overture_places_bbox.sql
-- Extraction POI sur le plateau de Saclay -- Remplace xmin/xmax/ymin/ymax par ta bounding box SELECT id, names.primary AS nom, categories.primary AS categorie, confidence, ST_AsText(geometry) AS wkt FROM read_parquet( 's3://overturemaps-us-west-2/release/2026-03-18.0/theme=places/type=place/*', filename=true, hive_partitioning=true ) WHERE bbox.xmin BETWEEN 2.151298 AND 2.235049 AND bbox.ymin BETWEEN 48.706094 AND 48.722411 AND confidence > 0.8 ORDER BY confidence DESC LIMIT 500;
overture_buildings_bbox.sql
-- Empreintes de bâtiments avec hauteur et nom SELECT id, names.primary AS nom, height, subtype, class, ST_Area(ST_GeomFromWKB(geometry)) AS surface_m2, geometry FROM read_parquet( 's3://overturemaps-us-west-2/release/2026-03-18.0/theme=buildings/type=building/*.parquet' ) WHERE bbox.xmin > 2.2946 AND bbox.xmax < 2.3920 AND bbox.ymin > 48.8396 AND bbox.ymax < 48.8749 AND height IS NOT NULL ORDER BY height DESC;
overture_roads_bbox.sql
-- Réseau routier : segments avec classe et vitesse SELECT id, names.primary AS nom, subtype, class, ST_Length(ST_GeomFromWKB(geometry)) * 111320 AS longueur_m, geometry FROM read_parquet( 's3://overturemaps-us-west-2/release/2026-03-18.0/theme=transportation/type=segment/*', hive_partitioning=true ) WHERE bbox.xmin BETWEEN 2.2946 AND 2.3920 AND bbox.ymin BETWEEN 48.8396 AND 48.8749 AND class IN ('primary', 'secondary', 'motorway');
overture_admin_boundaries.sql
-- Frontières administratives (pays = niveau 2) SELECT id, names.primary AS nom, subtype, admin_level, iso_country_code_alpha2, ST_Area(ST_GeomFromWKB(geometry)) AS surface, geometry FROM read_parquet( 's3://overturemaps-us-west-2/release/2026-03-18.0/theme=divisions/type=division_area/*', hive_partitioning=1 ) WHERE admin_level = 2 -- pays -- admin_level = 4 → régions, 6 → départements AND ST_GeometryType(ST_GeomFromWKB(geometry)) IN ('POLYGON', 'MULTIPOLYGON');
4
Jointures & analyses spatiales
Les opérations PostGIS classiques — réécriture DuckDB
spatial_join_points_polygons.sql
-- Jointure spatiale : points dans des polygones -- (ex : POI dans chaque commune) SELECT z.nom_zone, COUNT(p.id) AS nb_poi, AVG(p.confidence) AS confiance_moy FROM read_parquet('zones.parquet') z JOIN read_parquet('points.parquet') p ON ST_Within( ST_GeomFromWKB(p.geometry), ST_GeomFromWKB(z.geometry) ) GROUP BY z.nom_zone ORDER BY nb_poi DESC;
buffer_analysis.sql
-- Buffer et comptage dans un rayon (ex: concurrents à 500m) -- Note: ST_Buffer travaille en degrés → 0.0045° ≈ 500m en Europe WITH point_ref AS ( SELECT ST_Point(2.3370, 48.8606) AS geom -- Louvre ), zone AS ( SELECT ST_Buffer(geom, 0.0045) AS buffer_500m FROM point_ref ) SELECT categories.primary AS categorie, COUNT(*) AS nb, ROUND(AVG(confidence), 2) AS confiance FROM read_parquet('poi_paris.parquet') p, zone WHERE ST_Within(ST_GeomFromWKB(p.geometry), zone.buffer_500m) GROUP BY categorie ORDER BY nb DESC LIMIT 20;
nearest_neighbor.sql
-- Plus proche voisin : trouver le POI le plus proche d'un point SELECT names.primary AS nom, categories.primary AS type, ST_Distance( ST_GeomFromWKB(geometry), ST_Point(2.3370, 48.8606) ) * 111320 AS distance_m FROM read_parquet('poi_paris.parquet') WHERE bbox.xmin BETWEEN 2.31 AND 2.37 AND bbox.ymin BETWEEN 48.84 AND 48.88 ORDER BY distance_m LIMIT 10;
5
Export vers tous les formats SIG
DuckDB → GeoJSON, FlatGeobuf, GeoPackage, CSV, Shapefile
export_formats.sql
-- Export GeoJSON (universel, drag & drop QGIS/Kepler) COPY ( SELECT id, names.primary AS nom, confidence, geometry FROM read_parquet('poi_paris.parquet') ) TO 'poi_paris.geojson' WITH (FORMAT GDAL, DRIVER 'GeoJSON'); -- Export FlatGeobuf (idéal pour PMTiles + Tippecanoe) COPY ( SELECT * FROM read_parquet('batiments.parquet') ) TO 'batiments.fgb' WITH (FORMAT GDAL, DRIVER 'FlatGeobuf'); -- Export GeoParquet optimisé (format cible recommandé) COPY ( SELECT * FROM read_parquet('source.parquet') WHERE confidence > 0.9 ) TO 'output_filtered.parquet'; -- Export GeoPackage (QGIS natif, PostGIS compatible) COPY ( SELECT * FROM read_parquet('zones.parquet') ) TO 'zones.gpkg' WITH (FORMAT GDAL, DRIVER 'GPKG'); -- Export Shapefile (pour le géomètre du cadastre en 2030 😅) COPY ( SELECT * FROM read_parquet('data.parquet') ) TO 'output.shp' WITH (FORMAT GDAL, DRIVER 'ESRI Shapefile');
6
Conversion locale : GeoParquet → GeoJSON en ligne de commande
convert_cli.sh
# Lancer DuckDB en mode CLI avec un script SQL duckdb -c "LOAD spatial; LOAD httpfs; COPY (SELECT * FROM read_parquet('data.parquet')) TO 'output.geojson' WITH (FORMAT GDAL, DRIVER 'GeoJSON');" # Ou en pipe (utile pour les workflows CI/CD) echo "LOAD spatial; COPY (SELECT id, geometry FROM read_parquet('input.parquet')) TO 'out.fgb' WITH (FORMAT GDAL, DRIVER 'FlatGeobuf');" | duckdb
7
Workflow complet : Overture → analyse → export
Pipeline end-to-end sans toucher à PostGIS
full_pipeline.sql
-- Pipeline complet : extraction → enrichissement → export -- Cas d'usage : densité de restaurants par quartier à Lyon LOAD spatial; LOAD httpfs; -- Étape 1 : Extraire les POI restauration sur Lyon CREATE OR REPLACE TABLE restaurants_lyon AS SELECT id, names.primary AS nom, confidence, ST_GeomFromWKB(geometry) AS geom FROM read_parquet( 's3://overturemaps-us-west-2/release/2026-03-18.0/theme=places/type=place/*', hive_partitioning=true ) WHERE bbox.xmin BETWEEN 4.7900 AND 4.9000 AND bbox.ymin BETWEEN 45.7200 AND 45.8000 AND categories.primary LIKE '%restaurant%' AND confidence > 0.75; -- Étape 2 : Charger les quartiers (GeoJSON local) CREATE OR REPLACE TABLE quartiers AS SELECT *, ST_GeomFromWKB(geometry) AS geom FROM ST_Read('quartiers_lyon.geojson'); -- Étape 3 : Jointure spatiale + densité SELECT q.nom_quartier, COUNT(r.id) AS nb_restaurants, ROUND(COUNT(r.id) / (ST_Area(q.geom) * 12321), 2) AS resto_par_km2, q.geom AS geometry FROM quartiers q LEFT JOIN restaurants_lyon r ON ST_Within(r.geom, q.geom) GROUP BY q.nom_quartier, q.geom ORDER BY resto_par_km2 DESC;
stats_globales.sql
-- Stats rapides sur un GeoParquet (remplace psql \d + SELECT count) SELECT COUNT(*) AS nb_entites, COUNT(DISTINCT categories.primary) AS nb_categories, ROUND(AVG(confidence), 3) AS confiance_moy, MIN(bbox.xmin) AS lon_min, MAX(bbox.xmax) AS lon_max, MIN(bbox.ymin) AS lat_min, MAX(bbox.ymax) AS lat_max FROM read_parquet('poi_paris.parquet');
8
DuckDB vs PostGIS : cheat sheet
Opération PostGIS (SQL) DuckDB Spatial
Point dans polygone ST_Within(p.geom, z.geom) ST_Within(ST_GeomFromWKB(p.geometry), z.geom)
Distance en mètres ST_Distance(a::geography, b::geography) ST_Distance(a, b) * 111320 (approx)
Intersection ST_Intersects(a, b) ST_Intersects(a, b) identique
Enveloppe convexe ST_ConvexHull(ST_Collect(geom)) ST_ConvexHull(ST_Collect(list(geom)))
Lire depuis cloud Impossible natif read_parquet('s3://...') natif
Projection EPSG ST_Transform(geom, 4326) ST_Transform(geom, 'EPSG:4326', 'EPSG:2154')