Adding an LLM Provider
The kit ships with a unified LLMService abstraction. Switching providers requires changing one environment variable — no route or business logic changes needed.
Built-in providers
OpenAI
LLM_PROVIDER=openai
OPENAI_API_KEY=sk-...
LLM_DEFAULT_MODEL=gpt-4o
All OpenAI chat completion models are supported. For streaming, set stream=True in the request body.
Anthropic
LLM_PROVIDER=anthropic
ANTHROPIC_API_KEY=sk-ant-...
LLM_DEFAULT_MODEL=claude-3-5-sonnet-20241022
The Anthropic SDK is installed as an optional dependency. Messages API is used — system prompts, multi-turn conversations, and tool use are all supported.
Local models (Ollama / vLLM / LM Studio)
Any server that exposes an OpenAI-compatible /v1/chat/completions endpoint works:
LLM_PROVIDER=openai_compatible
LLM_BASE_URL=http://localhost:11434/v1
LLM_DEFAULT_MODEL=llama3.2
OPENAI_API_KEY=ollama # Ollama accepts any non-empty string
How the abstraction works
app/services/llm_service.py wraps the provider SDK behind a single interface:
class LLMService:
async def chat(
self,
messages: list[ChatMessage],
model: str | None = None,
stream: bool = False,
track_tokens: bool = True,
) -> ChatResponse | AsyncGenerator[str, None]:
...
Routes call llm_service.chat() and never import the OpenAI or Anthropic SDK directly. This isolation means you can add a new provider (Google Gemini, Cohere, etc.) by implementing a 50-line adapter class — nothing else changes.
Adding a custom provider
- Create
app/services/providers/my_provider.pyimplementing theBaseLLMProviderprotocol. - Register it in
app/services/llm_service.py's provider registry. - Set
LLM_PROVIDER=my_providerin.env.
The provider protocol requires three methods: chat(), stream_chat(), and count_tokens().
Token tracking
When track_tokens=True (default), the service records prompt and completion token counts to the usage table after each call. This data feeds the billing hooks and per-key dashboards.
# Token counts appear in the response
{
"reply": "...",
"model": "gpt-4o",
"usage": {
"prompt_tokens": 42,
"completion_tokens": 128,
"total_tokens": 170
}
}
