Using PDAL for Bathymetric Point Cloud Cleaning

Raw multibeam echosounder (MBES) datasets routinely contain water-column backscatter, surface multipath returns, acoustic sidelobe artifacts, and vessel-wake noise. Manual GUI-based cleaning fails at survey scale and breaks automated reproducibility. Using PDAL for Bathymetric Point Cloud Cleaning establishes a deterministic, scriptable workflow that enforces spatial consistency, manages memory overhead, and gates outputs before downstream interpolation. This pipeline targets automated coastal and marine spatial analysis environments where throughput, schema compliance, and CI/CD validation dictate operational viability. It operates under the Pillar: Bathymetric Processing & Terrain Modeling, within the Cluster: Point Cloud Filtering for Multibeam Sonar, and directly informs related workflows including DEM interpolation, surface smoothing, artifact removal, and memory bottleneck resolution.

Pipeline Architecture & Filter Sequencing

PDAL executes filters sequentially through a directed acyclic graph. For bathymetric data, the cleaning sequence must isolate true seafloor returns while preserving acoustic resolution and slope continuity. The production JSON below implements range clipping, statistical outlier rejection, ASPRS-compliant reclassification, and spatial partitioning.

{
  "pipeline": [
    {
      "type": "readers.las",
      "filename": "input_survey.laz"
    },
    {
      "type": "filters.range",
      "limits": "Z[0:120], Classification[1:9]"
    },
    {
      "type": "filters.outlier",
      "method": "statistical",
      "mean_k": 24,
      "multiplier": 2.0,
      "tag": "outlier"
    },
    {
      "type": "filters.assign",
      "assignment": "Classification[Classification!=9] = 18"
    },
    {
      "type": "filters.splitter",
      "length": 400,
      "origin_x": "auto",
      "origin_y": "auto"
    },
    {
      "type": "writers.las",
      "filename": "cleaned_%.laz",
      "compression": "laszip",
      "forward": "all"
    }
  ]
}

Operational Notes:

  • filters.range removes surface returns (Z > tidal datum + max draft) and deep-water noise. Adjust bounds to local bathymetry and survey datum.
  • filters.outlier uses a k-nearest neighbor density estimator. For high-density MBES (>50 pts/m²), increase mean_k to 32–48 to prevent over-filtering of steep slope returns.
  • filters.assign reclassifies non-ground points to ASPRS code 18 (high noise). This preserves original geometry while flagging artifacts for downstream exclusion.
  • filters.splitter tiles the dataset into 400m² chunks, enabling parallel execution and preventing heap exhaustion.

Integrating this sequence into broader Bathymetric Processing & Terrain Modeling workflows requires strict adherence to vertical datum alignment before PDAL execution. Misaligned tidal corrections will cause filters.range to clip valid seafloor returns or retain surface noise.

Memory Optimization & Streaming Execution

MBES surveys routinely exceed 40–80 GB. PDAL defaults to in-memory loading, which triggers OOM failures on standard CI runners or edge processing nodes. Force disk-backed streaming and cap heap allocation:

# Enable streaming mode and limit worker threads to prevent memory thrashing
pdal pipeline pipeline.json --streaming --threads 4

For Python-based orchestration, bypass the default pdal.Pipeline in-memory buffer by leveraging pdal.Pipeline.execute() with explicit chunking. Use filters.splitter upstream to force tile-based I/O. Monitor RSS usage; if heap exceeds 75% of available RAM, reduce tile size to 200m and enable --streaming at the CLI level. PDAL’s internal memory manager respects OS-level cgroups, but explicit thread capping prevents Python GIL contention during parallel tile writes.

CRS Alignment & Vertical Datum Enforcement

Bathymetric pipelines fail silently when horizontal and vertical coordinate reference systems diverge. PDAL does not auto-resolve datum shifts. Enforce explicit transformation chains using filters.reprojection or filters.transformation before filtering:

{
  "type": "filters.reprojection",
  "in_srs": "EPSG:4326+EPSG:5703",
  "out_srs": "EPSG:32618+EPSG:5703",
  "forward": "all"
}

Always validate vertical datums against local tidal benchmarks (e.g., MLLW, NAVD88, or LAT). Use pdal info --metadata to verify VCS and PCS codes. If your survey provider delivers raw ellipsoidal heights, apply a geoid model via filters.transformation with --grid pointing to a .gtx or .tif geoid file. Failure to normalize Z-values before filters.range execution guarantees datum-induced clipping or artifact retention.

Validation & Quality Gates

Automated cleaning requires deterministic post-processing checks. Implement density thresholds, coverage validation, and classification audits before passing data to Point Cloud Filtering for Multibeam Sonar or interpolation modules.

Use PDAL’s filters.stats to generate summary metrics:

pdal pipeline stats_pipeline.json

Parse the output JSON to enforce CI/CD gates:

  • Minimum point density: ≥ 15 pts/m² (shallow), ≥ 5 pts/m² (deep)
  • Classification 9 (ground) ratio: ≥ 85%
  • Z-range compliance: No values outside survey bounds ± 2m tolerance

Reject tiles that fail these thresholds. Log failures with exact bounding coordinates for manual review. This prevents garbage-in-garbage-out propagation into DEM generation.

Python Integration & CI/CD Orchestration

Embed PDAL directly into Python spatial analysis stacks using the pdal Python bindings. Wrap pipeline execution in retry logic and structured logging:

import pdal
import json
import logging

logging.basicConfig(level=logging.INFO)

def run_bathy_cleaner(pipeline_json: dict, input_file: str) -> bool:
    pipeline = pdal.Pipeline(json.dumps(pipeline_json))
    try:
        pipeline.execute()
        stats = pipeline.metadata
        ground_ratio = stats['stats']['count']['Classification_9'] / stats['stats']['count']['total']
        if ground_ratio < 0.85:
            logging.warning(f"Ground ratio {ground_ratio:.2%} below threshold. Flagging tile.")
            return False
        logging.info("Pipeline executed successfully. Output validated.")
        return True
    except pdal.PipelineError as e:
        logging.error(f"PDAL execution failed: {e}")
        return False

Reference the official PDAL Pipeline Documentation for schema validation and filter parameter updates. For vertical datum transformations, consult NOAA VDatum Technical Documentation to ensure geoid offsets match your survey epoch.

Production Deployment Checklist

This architecture eliminates manual intervention, enforces spatial rigor, and scales across distributed marine survey campaigns. Deploy, validate, and iterate.