5× cheaper and 2× more features than Firecrawl. Anti-bot bypass on demand. Recurring monitors with diff-only webhooks. AI-bot policy + llms.txt + per-page signals out of the box. JSON in, JSON out.
# Press "Run →" to see the API shape.
No SDK to install, no client library to keep up to date. JSON over HTTPS, bearer token, done.
Sub-second markdown + signals + AI-bot policy + llms.txt. Optional anti-bot routing. metadata_only for fast indexing.
One call, N URLs, configurable concurrency. Per-page cost passthrough.
Async queue. Sitemap mode. Path filters via regex. Webhook with HMAC-SHA256 on completion.
Pass a cron expression and we re-run on schedule forever. return_only_changed: true means webhooks fire only on real changes.
Compare two runs by content_hash. Returns added / removed / changed page lists. Powers monitoring and content alerts.
Direct fetch by default; transparent escalation to managed proxy + headless on 403/429/503/CF challenges. You only pay for the hard ones.
Crawl a domain, get back a properly-formatted llms.txt for the customer to publish. Nobody else ships this.
For any URL, return whether GPTBot, ClaudeBot, PerplexityBot, Google-Extended, Bytespider, CCBot etc. are allowed by robots.txt. Cached per host.
Tier caps with Retry-After, key rotation with 24h grace, full audit log. Web dashboard included.
REST in, JSON out, bearer auth. If you can write a fetch call, you have a client. We won't ship a JS lib that breaks every release.
// no install — Node 18+ has fetch const r = await fetch("https://api.crawlcrawl.com/v1/scan", { method: "POST", headers: { "Authorization": `Bearer ${process.env.CRAWLCRAWL_KEY}`, "Content-Type": "application/json", }, body: JSON.stringify({ url: "https://example.com", only_main_content: true, include_links: true, }), }); const data = await r.json(); console.log(data.markdown, data.signals, data.links);
Crawl docs sites and changelogs into chunked markdown. only_main_content=true strips nav/footer/sidebar so embeddings index meaning, not chrome.
Give your agent a web_read tool that returns markdown plus extracted metadata, not 4MB of script tags. Token bills stay sane.
Recurring monitors with return_only_changed: true. Webhook fires the moment a competitor's pricing page changes — and stays silent otherwise.
Bulk scan domains, pull contact pages and structured data. scan/bulk handles 100 URLs per call with built-in concurrency.
Per-page signals bundle: title, canonical, OG, JSON-LD types, hreflang, AI directives, link graph, orphans, robots-meta. SQL on it directly.
respect_robots=true is default. Malicious-domain blocklist refuses seeds proactively. AI-bot policy returned per host.
Single LXC, public IP, Postgres-backed queue. We charge per byte not per call, which is why the pricing is what it is.
Free tier ships first 1,000 pages/month. No credit card. Upgrade when you outgrow it.