Retrieval-Augmented Generation (RAG)

Grounding LLM responses in retrieved documents from a vector store.

Definition

Retrieval-Augmented Generation (RAG) is a technique that improves LLM output accuracy by first retrieving relevant context from a document store, then injecting that context into the prompt before generating a response. Instead of relying solely on the model's training data, RAG systems ground answers in your specific documents.

Why it matters for AI APIs

Without RAG, LLMs hallucinate facts not in their training data. RAG lets you build AI that accurately answers questions about your internal docs, legal contracts, product manuals, or any proprietary knowledge base — without fine-tuning a model.

In FastAPI AI Kit

FastAPI AI Kit ships a complete RAG pipeline: document ingestion with chunking, embedding via OpenAI or compatible APIs, storage in pgvector or Qdrant, and automatic context injection at query time. You provide documents; the kit handles the rest.

Retrieval-Augmented Generation (RAG)

Definition

Why it matters for AI APIs

In FastAPI AI Kit

Related terms