Learn · Scanner
How our crawler works
The Consepo crawl engine turns the pages it can reach into CSP evidence. That means coverage is defined by URL discovery, rendering, and the limits of what a public crawl can actually access.
The short version
Our platform discovers URLs in a fixed order: starting URL first, sitemap links second, and page links after that. Consepo layers scan-specific controls on top of that baseline and then processes the returned HTML for CSP-relevant resources.
Our platform handles the crawl job
Consepo starts a crawl job on our platform, then waits for the asynchronous result set to complete before processing the pages we received.
Real browser rendering by default
Our scanner defaults to rendered page fetches, so JavaScript-executed resources show up in the evidence set instead of being missed by static HTML-only analysis.
CSP evidence, not just page text
The point of the crawl is not indexing content. We use the returned HTML and page coverage to infer the scripts, styles, frames, fonts, and network origins your policy has to account for.
URL discovery order
This behavior is the baseline mental model for why one URL gets scanned and another never appears in the report.
- 1
Start with the URL you submit
Every scan begins from the exact page you enter. That page is the first rendered document and the first place the crawler can discover additional URLs.
- 2
Add sitemap URLs
Our platform next looks at sitemap URLs when using its default discovery mode, which broadens coverage beyond what the first page links directly.
- 3
Follow page links
After that, the crawler follows links it finds in pages that were already discovered, subject to crawl depth, page-limit, and allow/deny rules.
Controls that shape coverage in Consepo
We do not just fire off an unconstrained crawl. The scan form and API parameters decide how much of the discoverable site we ask our platform to traverse.
Start URL
Defines the crawl root and the first document rendered.
Page limit
Caps how many pages the crawl can process in one scan, bounded by your plan and current platform limits.
Max depth
Limits how many link levels away from the starting URL the crawler may follow.
Include / exclude patterns
Narrows or trims the discovered set so scans stay focused on the pages you actually want in scope.
Render JavaScript
Enabled by default in Consepo. Turn it off only when you intentionally want a faster HTML-only crawl of largely static pages.
In the current implementation, Consepo also tells our platform not to follow external links, which keeps scans focused on the target site instead of wandering into third-party destinations.
What the crawler cannot do
A scan is only as complete as the site paths it can legitimately discover and access. That is why crawler-driven evidence is strong, but never the whole story for CSP rollout.
Our platform respects robots.txt, including crawl-delay, and it does not bypass bot defenses. If your origin serves a challenge page instead of the real application, the crawler sees the challenge page too.
- Login walls, multi-step checkouts, admin flows, and personalized routes usually are not reachable from a public crawl alone.
- Our platform does not bypass CAPTCHAs, Turnstile, WAF challenges, or robots.txt restrictions.
- An empty or weak internal link graph limits coverage. If pages are not in the sitemap and nothing links to them, a crawler has nothing to discover.
- Overly tight include patterns can make a scan look broken when the real issue is that allowed URLs never matched the pattern set.
How crawl output becomes CSP guidance
Once the crawl finishes, Consepo processes the completed page set and turns that evidence into policy suggestions, diagnostics, and follow-on monitoring decisions.
Loaded resource discovery
Rendered pages reveal the third-party and first-party origins behind script, style, font, image, frame, and connect activity.
Blocked-page diagnostics
When crawling hits challenge pages or robots restrictions, we can surface those blocked samples instead of pretending the site was scanned cleanly.
Actionable CSP drafting
The crawl evidence becomes the starting point for policy suggestions, export snippets, and the tighter review loop that follows in Report-Only mode.
Crawl evidence is the first step, not the last one
Public crawling gives you the strongest possible draft from visible pages. CSP monitoring closes the rest of the gap by capturing what real visitors hit behind logins, checkouts, admin shells, and dynamic user journeys.