Kriging vs IDW for Bathymetry Interpolation

Automated bathymetric DEM pipelines fail during grid generation when the interpolator is chosen by static default rather than by the actual spatial structure of the input point cloud. Ingesting heterogeneous multibeam echosounder (MBES) swaths, legacy single-beam tracklines, or backscatter-derived soundings into a fixed gridding routine produces either aliased terrace artifacts from undersampled IDW kernels, or singular-matrix inversions and out-of-memory crashes from unconstrained Kriging on dense data. The correct operational pattern is a deterministic routing stage that evaluates survey density, spatial autocorrelation range, and available compute resources before any interpolation begins. This page defines that routing mechanism and its production implementation, as a focused refinement of the DEM Interpolation Techniques for Seafloor Mapping workflow.

Why the Choice Matters: Root Cause of Interpolation Failures

IDW (Inverse Distance Weighting) and Ordinary Kriging (OK) fail in opposite directions when applied to the wrong input.

IDW on sparse, irregular data assigns identical inverse-distance weights regardless of the structural arrangement of soundings. When survey coverage is uneven — e.g., historical tracklines at 200 m spacing intersecting modern MBES swaths at 5 m spacing — IDW produces bull’s-eye artifacts around isolated measurements and fails to honour the spatial continuity expected between tracklines. The failure is silent: the output GeoTIFF is valid, but the DEM contains interpolated terracing that corrupts volume calculations and habitat models.

Kriging on dense MBES data triggers a different failure class. Ordinary Kriging assembles an N×N covariance matrix and solves an N×N linear system. At 50 million soundings, unpartitioned covariance assembly is O(N²) in memory and O(N³) to solve — a worker with 32 GB of RAM will OOM before the first grid cell is written. The pykrige library will raise a MemoryError or produce a numpy.linalg.LinAlgError: Singular matrix if the input is overspecified.

The diagram below contrasts how each interpolator assigns weight to surrounding soundings when estimating one grid cell — the structural difference that determines which input each one corrupts.

The following snippet reproduces the dense-data Kriging failure so it can be identified in CI logs:

import numpy as np
from pykrige.ok import OrdinaryKriging

# Simulate dense MBES coverage (1 m spacing, 10,000 points — already borderline)
rng = np.random.default_rng(42)
n = 10_000
x = rng.uniform(0, 1000, n)
y = rng.uniform(0, 1000, n)
z = -50 + rng.normal(0, 0.2, n)

# This will exhaust RAM or raise LinAlgError on a standard worker:
ok = OrdinaryKriging(x, y, z, variogram_model="spherical", verbose=False)
# ok.execute("grid", ...) -> MemoryError or LinAlgError

Routing Decision: Spatial Statistics Drive Interpolator Selection

The routing gate must be driven by two metrics computed directly from the input point cloud using scipy.spatial.cKDTree:

Median nearest-neighbour spacing — measures effective survey density relative to the target grid resolution. For each sounding $i$ , let $d_i$ be the distance to its nearest neighbour; the routing metric is $\tilde{d} = \mathrm{median}(d_1, \dots, d_N)$ .
Coefficient of variation (CV) of nearest-neighbour distances — measures spatial regularity. With $\sigma_d$ the standard deviation and $\bar{d}$ the mean of the $d_i$ , the metric is

\mathrm{CV} = \frac{\sigma_d}{\bar{d}}.

High CV signals mixed-density coverage (trackline gaps, swath overlaps) where IDW will fail.

Routing rules:

Condition	Interpolator
Median spacing < 25 % of target resolution AND CV < 0.35	IDW (linear `griddata`)
Median spacing > 1.5× target resolution OR CV > 0.35	Ordinary Kriging

These thresholds prevent Kriging from being applied to dense MBES data (triggering O(N³) matrix assembly) and IDW from being applied to sparse historical surveys (producing terrace artifacts). The point cloud filtering for multibeam sonar stage must complete before routing metrics are computed — outliers inflate CV and bias median spacing toward artificially large values, causing incorrect interpolator selection.

Interpolator Routing Decision — Pipeline Diagram

Geostatistical Parameterisation: Variogram Constraints for Bathymetric Data

When the routing gate selects Ordinary Kriging, variogram validation is mandatory before any grid execution begins. Unconstrained spherical or exponential models applied to bathymetric data with abrupt depth gradients — shelf breaks, dredged channels, coral structure — routinely produce degenerate parameter estimates: negative sill values, infinite range estimates, or nugget-to-sill ratios above 0.8.

The pipeline must enforce these constraints after fitting:

Nugget-to-sill ratio < 0.6. With nugget $c_0$ and partial sill $c$ , the total sill is $c_0 + c$ and the constraint is $\frac{c_0}{c_0 + c} < 0.6$ . Ratios above this threshold mean measurement error dominates the modelled spatial structure; Kriging will amplify noise rather than interpolate meaningful structure. Reject and fall back to IDW.
Sill > 0. A negative sill indicates a non-positive-definite covariance model; the Kriging system will be ill-conditioned.
Range within [2× target resolution, 10× survey extent]. A range smaller than two grid cells means the variogram captures only noise. A range larger than the survey extent is extrapolation, not interpolation.

Fit the experimental variogram with directional binning (at least four azimuth bins, 45° tolerance) to detect anisotropy. Coastal bathymetry frequently shows along-slope continuity 3–5× longer than cross-slope, and an isotropic model will over-smooth cross-contour gradients. For surveys exhibiting strong anisotropy (range ratio > 2), switch to variogram_model="linear" with an anisotropy ratio parameter rather than forcing an isotropic spherical fit.

Step-by-Step Production Implementation

The following class implements the complete routing-and-interpolation pipeline. It requires numpy>=1.24, scipy>=1.11, pykrige>=1.7, pyproj>=3.6, and rasterio>=1.3.

Step 1 — Compute routing metrics and select interpolator:

import numpy as np
from scipy.spatial import cKDTree
import logging

logger = logging.getLogger(__name__)

def compute_routing_metrics(
    coords: np.ndarray,
    target_resolution: float
) -> dict:
    """
    Compute nearest-neighbour spacing statistics and select interpolator.

    Args:
        coords: (N, 2) array of projected XY coordinates in metres.
        target_resolution: Target grid cell size in metres.

    Returns:
        dict with 'median_spacing', 'cv', and 'use_kriging' keys.
    """
    if coords.shape[0] < 4:
        raise ValueError(
            f"Insufficient soundings ({coords.shape[0]}) for routing metric computation."
        )
    tree = cKDTree(coords)
    dists, _ = tree.query(coords, k=2)
    nn_dists = dists[:, 1]  # skip self (distance 0)
    median_spacing = float(np.median(nn_dists))
    if median_spacing <= 0.0:
        raise ValueError("Median nearest-neighbour spacing is zero — duplicate coordinates present.")
    cv = float(np.std(nn_dists) / median_spacing)
    use_kriging = (median_spacing > 1.5 * target_resolution) or (cv > 0.35)
    logger.info(
        "Routing: median_spacing=%.2f m, CV=%.3f, target_res=%.2f m -> %s",
        median_spacing, cv, target_resolution,
        "Ordinary Kriging" if use_kriging else "IDW (linear)"
    )
    return {"median_spacing": median_spacing, "cv": cv, "use_kriging": use_kriging}

Step 2 — Validate variogram before Kriging execution:

from pykrige.ok import OrdinaryKriging

def fit_and_validate_variogram(
    x: np.ndarray,
    y: np.ndarray,
    z: np.ndarray,
    nugget_sill_limit: float = 0.6
) -> OrdinaryKriging:
    """
    Fit an Ordinary Kriging model and validate variogram parameters.

    Raises:
        ValueError: If the fitted variogram fails quality constraints.
    """
    ok = OrdinaryKriging(
        x, y, z,
        variogram_model="spherical",
        enable_plotting=False,
        verbose=False,
        weight=True  # weighted least-squares variogram fit
    )
    # pykrige exposes fitted params as [psill, range, nugget]
    psill, vrange, nugget = ok.variogram_model_parameters
    sill = psill + nugget
    if sill <= 0:
        raise ValueError(
            f"Invalid variogram: sill={sill:.4f} <= 0. "
            "Check for duplicate Z values or a degenerate point distribution."
        )
    nugget_ratio = nugget / sill
    if nugget_ratio > nugget_sill_limit:
        raise ValueError(
            f"Nugget-to-sill ratio {nugget_ratio:.3f} exceeds limit {nugget_sill_limit}. "
            "Spatial structure is noise-dominated — falling back to IDW is recommended."
        )
    logger.info(
        "Variogram OK: psill=%.4f, range=%.2f m, nugget=%.4f, nugget/sill=%.3f",
        psill, vrange, nugget, nugget_ratio
    )
    return ok

Step 3 — Chunked interpolation with spatial subsetting:

import rasterio
from rasterio.transform import from_origin
from rasterio.windows import Window
from scipy.interpolate import griddata
from pyproj import CRS

def execute_interpolation(
    src_x: np.ndarray,
    src_y: np.ndarray,
    src_z: np.ndarray,
    bounds: tuple[float, float, float, float],
    target_resolution: float,
    target_crs_wkt: str,
    dst_path: str,
    chunk_size: int = 512,
    search_radius: float = 500.0
) -> None:
    """
    Chunked interpolation with deterministic interpolator routing.

    Args:
        src_x, src_y: Projected coordinates (same CRS as target).
        src_z: Depth values (positive-down convention recommended).
        bounds: (minx, miny, maxx, maxy) in CRS units.
        target_resolution: Grid cell size in metres.
        target_crs_wkt: WKT string for the output CRS.
        dst_path: Output GeoTIFF path.
        chunk_size: Tile side length in pixels (must match storage blocksize).
        search_radius: Buffer around each tile for source point subsetting.
    """
    coords = np.column_stack((src_x, src_y))
    metrics = compute_routing_metrics(coords, target_resolution)

    ok_model: OrdinaryKriging | None = None
    if metrics["use_kriging"]:
        ok_model = fit_and_validate_variogram(src_x, src_y, src_z)

    minx, miny, maxx, maxy = bounds
    width  = int(np.ceil((maxx - minx) / target_resolution))
    height = int(np.ceil((maxy - miny) / target_resolution))
    transform = from_origin(minx, maxy, target_resolution, target_resolution)

    profile = {
        "driver":    "GTiff",
        "dtype":     "float32",
        "count":     1,
        "width":     width,
        "height":    height,
        "crs":       CRS.from_wkt(target_crs_wkt).to_epsg() or target_crs_wkt,
        "transform": transform,
        "compress":  "deflate",
        "predictor": 3,          # floating-point predictor improves deflate ratio
        "tiled":     True,
        "blockxsize": chunk_size,
        "blockysize": chunk_size,
        "nodata":    -9999.0,
    }

    with rasterio.open(dst_path, "w", **profile) as dst:
        for row_off in range(0, height, chunk_size):
            for col_off in range(0, width, chunk_size):
                w = min(chunk_size, width  - col_off)
                h = min(chunk_size, height - row_off)

                x0 = minx + col_off * target_resolution
                x1 = x0   + w      * target_resolution
                y1 = maxy - row_off * target_resolution
                y0 = y1   - h      * target_resolution

                mask = (
                    (src_x >= x0 - search_radius) & (src_x <= x1 + search_radius) &
                    (src_y >= y0 - search_radius) & (src_y <= y1 + search_radius)
                )

                if not mask.any():
                    tile = np.full((h, w), -9999.0, dtype=np.float32)
                elif metrics["use_kriging"]:
                    assert ok_model is not None
                    xg = np.linspace(x0, x1, w)
                    yg = np.linspace(y0, y1, h)
                    z_grid, _ = ok_model.execute("grid", xg, yg)
                    tile = np.where(np.isnan(z_grid), -9999.0, z_grid).astype(np.float32)
                else:
                    pts   = np.column_stack((src_x[mask], src_y[mask]))
                    vals  = src_z[mask]
                    gx, gy = np.meshgrid(
                        np.linspace(x0, x1, w),
                        np.linspace(y0, y1, h)
                    )
                    z_grid = griddata(pts, vals, (gx, gy), method="linear", fill_value=np.nan)
                    tile = np.where(np.isnan(z_grid), -9999.0, z_grid).astype(np.float32)

                dst.write(tile, 1, window=Window(col_off, row_off, w, h))
                logger.info("Tile (%d, %d) written — method=%s",
                            row_off, col_off,
                            "OK" if metrics["use_kriging"] else "IDW")

Verification and Acceptance Test

After the output GeoTIFF is written, three assertions must pass before the file is promoted to the staging bucket:

import numpy as np
import rasterio

def verify_dem_output(dst_path: str, nodata: float = -9999.0) -> None:
    """
    Assert minimum DEM quality thresholds.
    Raises AssertionError if any check fails.
    """
    with rasterio.open(dst_path) as src:
        data = src.read(1)
        valid = data[data != nodata]

        coverage = valid.size / data.size
        assert coverage >= 0.80, (
            f"DEM coverage {coverage:.1%} below 80 % threshold — "
            "check search_radius or sounding density."
        )

        assert valid.min() < valid.max(), (
            "DEM has zero dynamic range — all valid cells hold the same depth value."
        )

        assert src.crs is not None, "Output DEM has no CRS — datum alignment failed."

        assert src.profile.get("tiled"), (
            "Output GeoTIFF is not tiled — COG block-alignment requirement not met."
        )

    print(
        f"DEM verification passed: coverage={coverage:.1%}, "
        f"depth range=[{valid.min():.2f}, {valid.max():.2f}] m"
    )

Run from the command line against the output path to confirm before handoff:

python -c "from pipeline import verify_dem_output; verify_dem_output('output/seafloor_dem.tif')"

Edge Cases and Gotchas

Mixed vertical datums silently poison routing metrics

If MSL and MLLW soundings are co-ingested without a prior tidal datum transformation, the Z-value variance is artificially inflated and the CV threshold will route sparse, well-distributed MBES data to Kriging unnecessarily. Apply pyproj.Transformer with always_xy=True and an explicit vertical CRS before computing routing metrics.

pykrige variogram_model_parameters ordering changed between v1.6 and v1.7

In v1.6 the variogram_model_parameters list is [nugget, psill, range]; from v1.7 onwards it is [psill, range, nugget]. Pin pykrige>=1.7 and test parameter unpacking with a synthetic dataset before deploying — silently unpacking the wrong order yields a valid-looking but inverted nugget-to-sill check.

Surveys with strong anisotropy fail isotropic spherical fits

For shelf-break perpendicular tracklines, when the directional range ratio exceeds 2:1, force variogram_model="linear" with anisotropy_scaling and anisotropy_angle parameters. An isotropic model will over-smooth cross-contour gradients and produce a ramp artifact parallel to the shelf edge.

Quadtree-partitioned Kriging tiles produce seam artifacts without feathering

When partitioning a survey domain into independent Kriging tiles for large surveys (> 50 million soundings), apply a half-chunk overlap and weight the overlapping region with a cosine taper during tile merging. Without feathering, the kriging variance discontinuity at tile edges is visible as a 1–3 cell seam in hillshade renders.

Related

DEM Interpolation Techniques for Seafloor Mapping — parent reference with grid resolution standards and full error tolerance table
Point Cloud Filtering for Multibeam Sonar — prerequisite outlier removal stage that conditions soundings before routing metrics are computed
Applying Gaussian Filters to Marine DEMs — post-interpolation smoothing for IDW terrace artifact suppression
Automated Spike Removal in Sonar Datasets — complementary noise-rejection stage for sonar outliers upstream of interpolation

Kriging vs IDW for Bathymetry Interpolation

Why the Choice Matters: Root Cause of Interpolation Failures #

Routing Decision: Spatial Statistics Drive Interpolator Selection #

Interpolator Routing Decision — Pipeline Diagram #

Geostatistical Parameterisation: Variogram Constraints for Bathymetric Data #

Step-by-Step Production Implementation #

Verification and Acceptance Test #

Edge Cases and Gotchas #