Point at any URL. We follow internal links, render JavaScript when needed, and hand back clean markdown for every page — with the link graph included.
One POST. We queue the job, follow the link graph up to your max_pages limit, and return clean markdown plus metadata for every page found.
curl -X POST https://api.crawlcrawl.com/v1/crawls \
-H "Authorization: Bearer crk_..." \
-H "Content-Type: application/json" \
-d '{
"url": "https://docs.example.com",
"max_pages": 100,
"respect_robots": true
}'
# returns
{ "id": 504, "status": "queued", "url": "https://docs.example.com" }
For full-site rebuilds where you own the target. Drops the delay, lifts concurrency, and skips robots.txt. Three× faster on real targets — verified.
curl -X POST https://api.crawlcrawl.com/v1/crawls \
-H "Authorization: Bearer crk_..." \
-d '{"url":"https://your-site.com","max_pages":500,"aggressive":true}'
GET /v1/crawls/{id} → status + page count + timings
GET /v1/crawls/{id}/pages → every page's markdown + meta
GET /v1/crawls/{id}/links → per-page outbound link graph
GET /v1/crawls/{id}/orphans → pages with no incoming internal links