Learn · Scanner

How our crawler works

The Consepo crawl engine turns the pages it can reach into CSP evidence. That means coverage is defined by URL discovery, rendering, and the limits of what a public crawl can actually access.

The short version

Our platform discovers URLs in a fixed order: starting URL first, sitemap links second, and page links after that. Consepo layers scan-specific controls on top of that baseline and then processes the returned HTML for CSP-relevant resources.

  • Our platform handles the crawl job

    Consepo starts a crawl job on our platform, then waits for the asynchronous result set to complete before processing the pages we received.

  • Real browser rendering by default

    Our scanner defaults to rendered page fetches, so JavaScript-executed resources show up in the evidence set instead of being missed by static HTML-only analysis.

  • CSP evidence, not just page text

    The point of the crawl is not indexing content. We use the returned HTML and page coverage to infer the scripts, styles, frames, fonts, and network origins your policy has to account for.

URL discovery order

This behavior is the baseline mental model for why one URL gets scanned and another never appears in the report.

  1. 1

    Start with the URL you submit

    Every scan begins from the exact page you enter. That page is the first rendered document and the first place the crawler can discover additional URLs.

  2. 2

    Add sitemap URLs

    Our platform next looks at sitemap URLs when using its default discovery mode, which broadens coverage beyond what the first page links directly.

  3. 3

    Follow page links

    After that, the crawler follows links it finds in pages that were already discovered, subject to crawl depth, page-limit, and allow/deny rules.

Controls that shape coverage in Consepo

We do not just fire off an unconstrained crawl. The scan form and API parameters decide how much of the discoverable site we ask our platform to traverse.

  • Start URL

    Defines the crawl root and the first document rendered.

  • Page limit

    Caps how many pages the crawl can process in one scan, bounded by your plan and current platform limits.

  • Max depth

    Limits how many link levels away from the starting URL the crawler may follow.

  • Include / exclude patterns

    Narrows or trims the discovered set so scans stay focused on the pages you actually want in scope.

  • Render JavaScript

    Enabled by default in Consepo. Turn it off only when you intentionally want a faster HTML-only crawl of largely static pages.

In the current implementation, Consepo also tells our platform not to follow external links, which keeps scans focused on the target site instead of wandering into third-party destinations.

What the crawler cannot do

A scan is only as complete as the site paths it can legitimately discover and access. That is why crawler-driven evidence is strong, but never the whole story for CSP rollout.

Our platform respects robots.txt, including crawl-delay, and it does not bypass bot defenses. If your origin serves a challenge page instead of the real application, the crawler sees the challenge page too.

  • Login walls, multi-step checkouts, admin flows, and personalized routes usually are not reachable from a public crawl alone.
  • Our platform does not bypass CAPTCHAs, Turnstile, WAF challenges, or robots.txt restrictions.
  • An empty or weak internal link graph limits coverage. If pages are not in the sitemap and nothing links to them, a crawler has nothing to discover.
  • Overly tight include patterns can make a scan look broken when the real issue is that allowed URLs never matched the pattern set.

How crawl output becomes CSP guidance

Once the crawl finishes, Consepo processes the completed page set and turns that evidence into policy suggestions, diagnostics, and follow-on monitoring decisions.

  • Loaded resource discovery

    Rendered pages reveal the third-party and first-party origins behind script, style, font, image, frame, and connect activity.

  • Blocked-page diagnostics

    When crawling hits challenge pages or robots restrictions, we can surface those blocked samples instead of pretending the site was scanned cleanly.

  • Actionable CSP drafting

    The crawl evidence becomes the starting point for policy suggestions, export snippets, and the tighter review loop that follows in Report-Only mode.

Crawl evidence is the first step, not the last one

Public crawling gives you the strongest possible draft from visible pages. CSP monitoring closes the rest of the gap by capturing what real visitors hit behind logins, checkouts, admin shells, and dynamic user journeys.

Get started

Sign up for Consepo for free

Run unlimited browser-rendered CSP scans, generate production-ready policies, and export deployment snippets — no credit card required.