live · LE TLS · self-serve keys + dashboard

Crawl the web,
/* in one POST */

5× cheaper and 2× more features than Firecrawl. Anti-bot bypass on demand. Recurring monitors with diff-only webhooks. AI-bot policy + llms.txt + per-page signals out of the box. JSON in, JSON out.

$ curl -X POST https://api.crawlcrawl.com/v1/scan
Live shape

Paste a URL. Get clean data.

measured production: 14 pages / 4.2s
POST /v1/scan idle
POST api.crawlcrawl.com/v1/scan
Params
  # Press "Run →" to see the API shape.
Demo uses canned responses with the exact shape the live API returns. Open the full API → endpoint: api.crawlcrawl.com
Twenty-six endpoints

Crawl, scan, monitor, diff. One API.

No SDK to install, no client library to keep up to date. JSON over HTTPS, bearer token, done.

POST /v1/scan

Scrape one URL

Sub-second markdown + signals + AI-bot policy + llms.txt. Optional anti-bot routing. metadata_only for fast indexing.

POST /v1/scan/bulk

Scan up to 100 URLs in parallel

One call, N URLs, configurable concurrency. Per-page cost passthrough.

POST /v1/crawls

Crawl a whole site

Async queue. Sitemap mode. Path filters via regex. Webhook with HMAC-SHA256 on completion.

cron field

Recurring monitors

Pass a cron expression and we re-run on schedule forever. return_only_changed: true means webhooks fire only on real changes.

GET /v1/crawls/{old}/diff/{new}

Crawl diff

Compare two runs by content_hash. Returns added / removed / changed page lists. Powers monitoring and content alerts.

cloud_mode: "auto"

Anti-bot when needed

Direct fetch by default; transparent escalation to managed proxy + headless on 403/429/503/CF challenges. You only pay for the hard ones.

POST /v1/llms-txt-build

Generate llms.txt

Crawl a domain, get back a properly-formatted llms.txt for the customer to publish. Nobody else ships this.

GET /v1/robots-policy

AI-bot policy

For any URL, return whether GPTBot, ClaudeBot, PerplexityBot, Google-Extended, Bytespider, CCBot etc. are allowed by robots.txt. Cached per host.

GET /v1/usage · /v1/keys · /v1/logs

Self-serve account

Tier caps with Retry-After, key rotation with 24h grace, full audit log. Web dashboard included.

Drop in anywhere

No SDK. Just HTTP.

REST in, JSON out, bearer auth. If you can write a fetch call, you have a client. We won't ship a JS lib that breaks every release.

typescript python rust curl

openapi.json · postman.json

typescript python curl
// no install — Node 18+ has fetch
const r = await fetch("https://api.crawlcrawl.com/v1/scan", {
  method: "POST",
  headers: {
    "Authorization": `Bearer ${process.env.CRAWLCRAWL_KEY}`,
    "Content-Type": "application/json",
  },
  body: JSON.stringify({
    url: "https://example.com",
    only_main_content: true,
    include_links: true,
  }),
});
const data = await r.json();
console.log(data.markdown, data.signals, data.links);
What people build with it

Cheap pipes for whatever you're shipping.

01 / RAG

Knowledge bases

Crawl docs sites and changelogs into chunked markdown. only_main_content=true strips nav/footer/sidebar so embeddings index meaning, not chrome.

02 / Agents

Web-browsing tools

Give your agent a web_read tool that returns markdown plus extracted metadata, not 4MB of script tags. Token bills stay sane.

03 / Monitoring

Change detection

Recurring monitors with return_only_changed: true. Webhook fires the moment a competitor's pricing page changes — and stays silent otherwise.

04 / Lead gen

Structured scraping

Bulk scan domains, pull contact pages and structured data. scan/bulk handles 100 URLs per call with built-in concurrency.

05 / SEO & AEO

Audit feeds

Per-page signals bundle: title, canonical, OG, JSON-LD types, hreflang, AI directives, link graph, orphans, robots-meta. SQL on it directly.

06 / Compliance

robots-aware by default

respect_robots=true is default. Malicious-domain blocklist refuses seeds proactively. AI-bot policy returned per host.

What ships today

Production. Not a demo.

Live now
Roadmap
✓ HTTP fetch + clean markdown
Schema-driven AI extraction
✓ Anti-bot routing on demand
Customer-supplied LLM keys for extraction
✓ Boilerplate-strip via readability
Per-key (not just per-project) quotas
✓ Sitemap-driven crawl mode
Stripe metered self-signup
✓ HMAC-SHA256 webhook signatures
Off-LXC backup mirror
✓ Recurring monitors + diff-only webhooks
Multi-region egress IPs
✓ Idempotency keys + max_age cache
Edit-monitor endpoint (today: delete + re-create)
✓ Custom headers + cookies (auth crawls)
SSE streaming for live crawl logs
✓ Per-project quotas + tier caps + audit log
Status / uptime page
✓ Self-serve key rotation + dashboard
SOC 2 / DPA

Single LXC, public IP, Postgres-backed queue. We charge per byte not per call, which is why the pricing is what it is.

One POST.
Pages out the other end.

Free tier ships first 1,000 pages/month. No credit card. Upgrade when you outgrow it.

See pricing → Read the API Get a key