Skip to main content
All use casesUse Case

Build a scalable document ingestion and extraction API.

Accept file uploads, process them asynchronously via background workers, extract structured data with LLMs, and store results — all in a battle-tested FastAPI pipeline.

FastAPICeleryRedisOpenAIPostgreSQLPydantic v2

The usual pain points

  • Handling large file uploads without blocking API threads
  • Processing documents asynchronously at scale
  • Extracting structured data from unstructured documents
  • Storing and querying processed results efficiently

How the kit solves them

  • Async file upload handling with streaming to object storage
  • Celery worker pool for parallel document processing
  • LLM extraction pipeline with structured output (Pydantic v2 schemas)
  • Postgres with JSONB columns for flexible structured data storage

Example implementation

main.py
@router.post("/v1/documents/process")
async def process_document(
    file: UploadFile,
    key: APIKey = Depends(get_api_key),
):
    doc_id = await storage.upload(file)
    job = await extraction_queue.enqueue(
        extract_document,
        doc_id=doc_id,
        schema=ExtractedInvoice,
    )
    return {"doc_id": doc_id, "job_id": job.id}

@celery.task
async def extract_document(doc_id: str, schema: type):
    text = await storage.read_text(doc_id)
    result = await llm.extract(text, output_schema=schema)
    await db.save(doc_id, result.model_dump())

Ready to build your document processing pipeline?

FastAPI AI Kit ships with everything shown above, pre-configured and production-ready. Clone the repo and start building in minutes.

Ready to ship your AI backend this weekend?

Join developers who skipped weeks of boilerplate and went straight to building.

Read the docs
No subscriptions · One-time payment · Lifetime updates