Skip to main content
All use casesUse Case

Ship a retrieval-augmented search API on your documents.

Ingest PDFs, Markdown, and text files into a vector store, then expose a semantic search endpoint powered by your LLM of choice — with pgvector or Qdrant pre-configured.

FastAPIpgvectorQdrantOpenAI EmbeddingsPostgreSQLAlembic

The usual pain points

  • Parsing and chunking documents for embedding
  • Choosing and integrating a vector store
  • Injecting retrieved context into LLM prompts
  • Managing embedding costs at scale

How the kit solves them

  • Built-in document ingestion pipeline with configurable chunk size
  • Pre-wired pgvector and Qdrant — switch with a single env var
  • Automatic context injection: top-k chunks inserted into LLM prompt
  • Token tracking per query for embedding + completion cost visibility

Example implementation

main.py
# Ingest a document
await rag.ingest(
    source="contracts/q4-2024.pdf",
    collection="legal-docs",
    chunk_size=512,
    overlap=64,
)

# Query with automatic context retrieval
result = await rag.query(
    question="What are the termination clauses?",
    collection="legal-docs",
    top_k=5,
    llm_model="gpt-4o",
)
# Returns answer + source references

Ready to build your rag document search api?

FastAPI AI Kit ships with everything shown above, pre-configured and production-ready. Clone the repo and start building in minutes.

Ready to ship your AI backend this weekend?

Join developers who skipped weeks of boilerplate and went straight to building.

Read the docs
No subscriptions · One-time payment · Lifetime updates