Full Site Crawler
The Full Site Crawler walks an entire website and extracts every discoverable page as clean markdown in a single run. Point it at a root URL, pick how deep to crawl, and it returns a structured tree of pages — titles, URLs, and full text — ready to download as JSON or CSV.
It's built for teams that need the contents of a whole site without opening each page. Common workflows: building a knowledge base from your own docs, snapshotting a competitor's product pages for analysis, feeding a site into an LLM as grounded context, or auditing SEO content across hundreds of URLs at once.
The crawler respects a page cap and shows live progress as it runs, so you can watch pages come in and cancel if it's going somewhere you didn't intend. Results are stored in your gallery for re-download later.
How it works
- Paste the root URL you want to crawl (e.g. your docs site or a competitor's marketing pages).
- Set the maximum page limit so the job stays within your budget.
- Start the crawl — live progress shows each page as the crawler discovers and extracts it.
- When the run finishes, review the page tree and download as JSON or CSV.
- Find the run later in your gallery under the Scraping tab.
Use cases
- Snapshot an entire documentation site for offline reference or LLM grounding.
- Audit all public pages on your own site for SEO keywords, broken links, or outdated copy.
- Analyse a competitor's product, pricing, and help-center pages in one batch.
- Build a structured corpus from long-form blogs or editorial archives.
- Feed a product's full public site into an AI agent as domain knowledge.
Frequently asked questions
How many pages can I crawl in one run?
You set a page cap before each run. Higher caps cost more credits; the tool shows the price before you start and refunds automatically if the crawl ends with zero usable pages.
What format do I get the results in?
Each page is returned as clean markdown with the title, URL, and body text. You can download the full set as JSON (full fidelity) or CSV (flat, spreadsheet-friendly).
Does it follow external links?
No — the crawler stays within the same domain as the root URL you provide. Links to other sites are recorded but not followed.
Can I stop a crawl that's running too long?
Yes. A Cancel button appears while the job is active; cancelling stops the crawler and refunds the unused portion of your credits.
Will this work on sites behind a login?
No — the crawler only sees publicly accessible pages. Pages that require authentication, cookies, or JavaScript-only rendering may be skipped or return partial content.
Where do I find previous crawls?
Every run is saved to your gallery under the Scraping tab. Click any entry to re-download the JSON or CSV without re-running the crawl.