All use casesUse Case

Ship a retrieval-augmented search API on your documents.

Ingest PDFs, Markdown, and text files into a vector store, then expose a semantic search endpoint powered by your LLM of choice — with pgvector or Qdrant pre-configured.

FastAPIpgvectorQdrantOpenAI EmbeddingsPostgreSQLAlembic

The usual pain points

✕Parsing and chunking documents for embedding
✕Choosing and integrating a vector store
✕Injecting retrieved context into LLM prompts
✕Managing embedding costs at scale

How the kit solves them

Built-in document ingestion pipeline with configurable chunk size
Pre-wired pgvector and Qdrant — switch with a single env var
Automatic context injection: top-k chunks inserted into LLM prompt
Token tracking per query for embedding + completion cost visibility

Example implementation

main.py

# Ingest a document
await rag.ingest(
    source="contracts/q4-2024.pdf",
    collection="legal-docs",
    chunk_size=512,
    overlap=64,
)

# Query with automatic context retrieval
result = await rag.query(
    question="What are the termination clauses?",
    collection="legal-docs",
    top_k=5,
    llm_model="gpt-4o",
)
# Returns answer + source references

Ready to build your rag document search api?

FastAPI AI Kit ships with everything shown above, pre-configured and production-ready. Clone the repo and start building in minutes.

Ready to ship your AI backend this weekend?

Join developers who skipped weeks of boilerplate and went straight to building.

Read the docs

No subscriptions · One-time payment · Lifetime updates