Skip to main content
🕷️

Crawl AI

Intelligent web crawling for legal resources. Discover, index, and organize legal documents from courts, governments, and international sources.

Features

🌐

Legal Source Discovery

Automatically discover and index legal resources from court websites, government databases, and legislation portals.

🔗

Multi-Source Crawling

Crawl Korean courts, Supreme Court, Constitutional Court, international tribunals, and regulatory agencies.

📑

Content Extraction

Extract case citations, statute references, legal opinions, and regulatory guidelines with high accuracy.

Intelligent Scheduling

Schedule automated crawls to keep your legal database up-to-date with the latest rulings and amendments.

🔍

Duplicate Detection

Content-hash based deduplication ensures clean, unique legal resources without redundancy.

📁

Resource Collections

Organize crawled documents into curated collections for specific cases, matters, or research projects.

Crawl Pipeline

1

Discover

Find legal sources

2

Crawl

Extract content

3

Process

Parse & classify

4

Index

Make searchable

Document Types

Court Cases Statutes Regulations Ordinances Treaties Guidelines Legal Opinions News

Korean Legal Sources

SourceTypeDocumentsStatus
Supreme Court of KoreaCourt1.2M+Active
Constitutional CourtCourt45K+Active
National Law Information CenterLegislation800K+Active
Korea Legislation Research InstituteResearch120K+Active
Financial Services CommissionRegulatory25K+Active

International Sources

United Nations Treaty Collection

International

World Trade Organization

International

International Court of Justice

Hague

European Court of Human Rights

Europe

Start New Crawl Job

Search Crawled Documents

API Endpoints

GET/api/crawlai/sources/List all crawl sources
POST/api/crawlai/jobs/Start new crawl job
POST/api/crawlai/documents/search/Search crawled documents
GET/api/crawlai/documents/recent/Recent crawled docs

Start Crawl Job

curl -X POST "http://localhost:8000/api/crawlai/jobs/" \
  -H "Authorization: Bearer <token>" \
  -H "Content-Type: application/json" \
  -d '{
    "source": "<source_uuid>",
    "query": "contract violation",
    "max_pages": 100,
    "filters": {"document_type": "case"}
  }'