CaveFinder Methodology

Validation

Calibrated against over 10,000 documented caves

The current scoring weights were tuned in the April 2026 calibration run. Each candidate the system produces is matched against a held-out set of documented cave entrances to measure how often a known cave lands in the top N candidates for the area being scanned.

Top-10 40.4% of true entrances land within the 10 highest-ranked candidates

Top-25 59.5% of true entrances land within the 25 highest-ranked candidates

Top-50 73.7% of true entrances land within the 50 highest-ranked candidates

Calibrated April 2026 · matching radius 50 m · validation set held out from training

What "match" means here. A candidate is counted as a match for a documented cave if it appears in the top N of the candidates returned for a scan area and its centroid is within 50 metres of the documented entrance. The 50 m matching radius is a calibration parameter chosen to be tighter than ridgewalk-scale uncertainty but loose enough to absorb GPS error in the validation records.

What these numbers do not promise. Hit rate measures how often a known cave is in the top of the list, not the false-positive rate among other candidates. CaveFinder is a ranking instrument, not a classifier — it sorts candidates so the strongest leads cluster at the top. Final confirmation still happens with boots on the ground.

Scoring components

Over 20 scoring components, combined deliberately

The pipeline runs 12 detection methods against the elevation surface, applies 14 scoring formulas as multiplicative modifiers, and combines the per-method scores through a calibrated fusion model. Two auxiliary context detectors contribute bonuses without producing candidates of their own.

The plain-English inventory below describes what each component looks for. Specific weights, thresholds, and code-level identifiers are not published.

12 · Detection methods

What we look for

Each detection method scans the elevation surface for a different morphological signature. The methods are grouped into four families — fill-and-diff, surround-context, pattern-recognition, and frequency-domain — plus a handful of newer detectors that don't fit those families cleanly. Multiple methods agreeing on the same location is a stronger signal than any single method on its own.

01

Closed-depression detection Fill family

Compares the raw elevation surface to a hydrologically filled version and flags every basin that holds water — the most fundamental signal for a karst entrance.

02

Nested-depression detection Fill family

Finds depressions inside larger depressions — the small inner basin sitting at the bottom of a broader sink that often marks the actual entrance.

03

Wiener-watershed segmentation Fill family

Initialised by depression-filling, then applies adaptive smoothing and watershed segmentation to extract depression boundaries that survive filtering — robust against LiDAR noise.

04

Contour-tree analysis Fill family

Builds the topological tree of nested closed contours and reads off the depressions as leaves — a topology-aware view of the same morphology.

05

Topographic position index Surround

Compares each cell's elevation to the average of its surroundings at multiple radii — strongly negative TPI flags hollows that depression filling alone may miss.

06

Negative openness Surround

Measures how much sky is visible from below the surface — a proxy for how enclosed a point is by nearby terrain. Picks up entrances under ledges and overhangs.

07

Geomorphon classification Pattern

Classifies every cell into one of ten standard landform types (peak, ridge, slope, hollow, pit, valley, etc.) by looking at the pattern of higher and lower neighbours. Pit and hollow classifications are the cave-relevant signal.

08

Wavelet decomposition Frequency

Decomposes the elevation surface into frequency bands and isolates the spatial scales where cave entrances tend to live (typically 5–25 m). Filters out tree-canopy and equipment artefacts.

09

Local-relief modelling Frequency

Subtracts a smoothed regional surface from the raw DEM to expose small-scale relief — a technique borrowed from archaeological LiDAR analysis that highlights subtle entrances on otherwise gentle terrain.

10

SinkSAM segmentation Other

A learned segmentation model trained to identify sinkhole-shaped boundaries directly from elevation imagery. Complements the geometric methods by recognising shapes that are obvious to a human reader but hard to express as thresholds.

11

Local-minimum extraction Other

Identifies elevation minima that survive a multi-pass smoothing filter. A simple but effective baseline detector that catches obvious sinks the more sophisticated methods can sometimes overweight.

12

Change-detection pass Other

Compares the actual elevation surface against a smoothed reference surface to flag the locations of largest deviation — captures both depressions and unusual local features the standard detectors might miss.

14 · Scoring formulas

How we weight what we find

A raw detection is just a yes/no signal. The scoring formulas turn each detection into a number between 0 and 100 by combining geometric properties (depth, area, shape), local context (slope, surrounding terrain, hydrology), regional context (karst region, nearby caves), and agreement between methods. Each formula is calibrated independently and then composed into the final ranking score.

01

Depth-area composite

Combines a depression's depth and surface area into a single base score, with a saturation curve so deep features don't drown out wider but shallower ones.

02

Multi-method consensus bonus

Boosts candidates that more than one detection method picks up at the same location. A site flagged by three independent methods scores higher than a site flagged by only one.

03

Method-family consensus bonus

Beyond raw method count, rewards agreement across different method families (fill, surround, pattern, frequency). A candidate confirmed by methods that look at the terrain in fundamentally different ways is stronger than one confirmed by four variations of the same approach.

04

Shape penalty

Penalises candidates whose footprint is too elongated to plausibly be a cave entrance — long ditches, drainage scars, and equipment tracks. A small bonus for circular shapes goes the other way.

05

Depth-to-width ratio

Penalises features that are unrealistically shallow for their width (typically false positives from canopy noise) and rewards features whose depth-to-width ratio matches the population of known karst entrances.

06

Sink-density bonus

Rewards candidates that sit inside a region of higher-than-baseline sink density. Karst doesn't usually produce one isolated entrance — it produces fields of them.

07

Build-up penalty

Strongly penalises candidates whose immediate surroundings show signs of human modification — quarry walls, road cuts, building pads, pond berms. These are the largest source of false positives in suburban or industrial terrain.

08

Frequency-method bonus

Adds an extra bonus when the frequency-domain detectors (wavelet, local-relief modelling) agree with the geometric methods. Frequency-domain agreement is the strongest cross-family signal.

09

Slope-context modifier

Rewards candidates on the gently-sloped hillside terrain where most karst entrances actually live, and penalises candidates on dead-flat valley floors (more often agricultural ponds) or near-vertical cliff faces.

10

Karst-region gating

Boosts candidates that fall inside known karst regions (mapped at the bedrock-geology level) and dampens scores in non-karst geology where entrances are statistically much rarer. Tunable per region.

11

Known-cave proximity boost

Mild boost for candidates that fall near publicly documented caves (OSM/Wikidata layer). Cave systems usually have multiple entrances; the presence of one nearby raises the prior on others.

12

Flow-accumulation bonus

Hydrologic flow accumulation modifier — checks whether a candidate sits at a point where surface flow concentrates. Currently retained but inactive in the live ranking; kept as an evaluation alternate.

13

Cluster-density bonus

Spatial-clustering modifier — looks for k-nearest-neighbour density of candidates. Currently retained but inactive in the live ranking; kept as an evaluation alternate.

14

Selectivity weighting

Down-weights detection methods that fire too liberally on a given scan area. A method that flags hundreds of candidates in a small area contributes less to the consensus than a method that flags only a few — the latter is more discriminating.

Fusion model

Combining method scores into one ranking

After every detection method has scored every candidate, a fusion model combines the per-method scores into a single 0–100 confidence value used for ranking. The live fusion is a calibrated weighted-sum ensemble that incorporates the multi-method consensus bonus and the method-family consensus bonus. Three alternative fusion schemes were evaluated during calibration and held in reserve as A/B alternates — the live ensemble was selected because it produced the best top-N hit rate against the validation set.

Context detectors

Two detectors that nudge, not produce

Two auxiliary detectors run alongside the main pipeline. They don't produce candidates of their own — they contribute bonuses to candidates produced by the primary detection methods.

Collapse-chain detector — identifies linear sequences of depressions that align along the same axis, suggesting a roof-collapse trace over a cave passage. Candidates that sit on a collapse chain receive a moderate bonus.
Lineament detector — identifies long linear surface features (faults, fractures, joints) that often align with subsurface drainage. Candidates that sit on a lineament receive a small bonus, since karst conduits frequently develop along structural weaknesses.

Data sources

What goes in

Every elevation number CaveFinder uses traces back to a public source. The analysis math is proprietary; the inputs are not.

USGS 3D Elevation Program (3DEP) — primary source. 1-meter LiDAR coverage across most of the continental US, fetched on demand for each scan area.
Copernicus and SRTM global DEMs — fallback for areas outside US LiDAR coverage. 30-meter satellite-derived elevation with reduced resolution.
OpenStreetMap and Wikidata — publicly documented cave records. Used only as an overlay (the "Known Caves" map layer) and for the proximity-boost scoring formula. Never used as training data, never written to user-facing outputs as training references.
Bedrock-geology rasters — public state and federal geological maps used to define the karst-region gating mask.

Limits

What CaveFinder is not

Not a substitute for ground confirmation. The system ranks candidates by the strength of their cave-like signature. A high-ranked candidate is a strong lead, not a confirmed cave. Final verification always requires boots on the ground.
Not a guarantee that every depression is a cave. Ranking is a probability surface. Even a top-10 candidate can be a sediment-filled doline with no opening, or an old quarry pit, or a tree-fall pit. The hit-rate numbers above quantify exactly how often the ranking gets it right.
Not a tool for publishing cave coordinates. CaveFinder displays candidates and known-cave overlays inside the user's own session. It does not export, share, or aggregate cave locations across users. Documented-cave overlays come from public records (OSM/Wikidata) only — never from private cave surveys.
Not a research dataset. The validation set used for calibration is internal and not redistributed. Researchers who want to use CaveFinder for academic work should contact zach@cavefinder.app for an academic licence and methodology supplement.

Contact

Questions about methodology

Karst researchers, NSS members, and land managers with technical questions about the pipeline, validation set construction, or calibration procedure can reach the author at zach@cavefinder.app. Reasonable methodology questions get a real reply, not a form letter.

How CaveFinder ranks the terrain