Retrieval-Augmented Generation (RAG)
Grounding LLM responses in retrieved documents from a vector store.
Definition
Retrieval-Augmented Generation (RAG) is a technique that improves LLM output accuracy by first retrieving relevant context from a document store, then injecting that context into the prompt before generating a response. Instead of relying solely on the model's training data, RAG systems ground answers in your specific documents.
Why it matters for AI APIs
Without RAG, LLMs hallucinate facts not in their training data. RAG lets you build AI that accurately answers questions about your internal docs, legal contracts, product manuals, or any proprietary knowledge base — without fine-tuning a model.
In FastAPI AI Kit
FastAPI AI Kit ships a complete RAG pipeline: document ingestion with chunking, embedding via OpenAI or compatible APIs, storage in pgvector or Qdrant, and automatic context injection at query time. You provide documents; the kit handles the rest.
