Skip to main content

Adding an LLM Provider

The kit ships with a unified LLMService abstraction. Switching providers requires changing one environment variable — no route or business logic changes needed.

Built-in providers

OpenAI

LLM_PROVIDER=openai
OPENAI_API_KEY=sk-...
LLM_DEFAULT_MODEL=gpt-4o

All OpenAI chat completion models are supported. For streaming, set stream=True in the request body.

Anthropic

LLM_PROVIDER=anthropic
ANTHROPIC_API_KEY=sk-ant-...
LLM_DEFAULT_MODEL=claude-3-5-sonnet-20241022

The Anthropic SDK is installed as an optional dependency. Messages API is used — system prompts, multi-turn conversations, and tool use are all supported.

Local models (Ollama / vLLM / LM Studio)

Any server that exposes an OpenAI-compatible /v1/chat/completions endpoint works:

LLM_PROVIDER=openai_compatible
LLM_BASE_URL=http://localhost:11434/v1
LLM_DEFAULT_MODEL=llama3.2
OPENAI_API_KEY=ollama   # Ollama accepts any non-empty string

How the abstraction works

app/services/llm_service.py wraps the provider SDK behind a single interface:

class LLMService:
    async def chat(
        self,
        messages: list[ChatMessage],
        model: str | None = None,
        stream: bool = False,
        track_tokens: bool = True,
    ) -> ChatResponse | AsyncGenerator[str, None]:
        ...

Routes call llm_service.chat() and never import the OpenAI or Anthropic SDK directly. This isolation means you can add a new provider (Google Gemini, Cohere, etc.) by implementing a 50-line adapter class — nothing else changes.

Adding a custom provider

  1. Create app/services/providers/my_provider.py implementing the BaseLLMProvider protocol.
  2. Register it in app/services/llm_service.py's provider registry.
  3. Set LLM_PROVIDER=my_provider in .env.

The provider protocol requires three methods: chat(), stream_chat(), and count_tokens().

Token tracking

When track_tokens=True (default), the service records prompt and completion token counts to the usage table after each call. This data feeds the billing hooks and per-key dashboards.

# Token counts appear in the response
{
    "reply": "...",
    "model": "gpt-4o",
    "usage": {
        "prompt_tokens": 42,
        "completion_tokens": 128,
        "total_tokens": 170
    }
}