Skip to main content
All use casesUse Case

Production chatbot infrastructure — sessions, history, billing.

Multi-turn conversation management, session persistence, streaming replies, per-user rate limiting, and token-based billing — the complete backend for a customer-facing AI chatbot.

FastAPISSEPostgreSQLRedisStripeOpenAI

The usual pain points

  • Persisting conversation history across sessions
  • Streaming LLM responses in real time to the browser
  • Preventing per-user abuse with rate limits
  • Billing users based on actual token consumption

How the kit solves them

  • Session persistence model with full conversation history in Postgres
  • SSE streaming endpoint built in — no custom EventSource wiring
  • Per-key rate limiting configurable per user tier
  • Token metering per session automatically fed to Stripe webhooks

Example implementation

main.py
@router.post("/v1/chat/stream")
async def chat_stream(
    body: ChatRequest,
    key: APIKey = Depends(get_api_key),
):
    history = await session_store.get(body.session_id)

    async def event_stream():
        total_tokens = 0
        async for chunk in llm.stream(
            messages=[*history, body.message],
        ):
            total_tokens += chunk.tokens
            yield f"data: {chunk.delta}\n\n"
        await meter.record(key.id, total_tokens)
        await session_store.append(body.session_id, body.message)

    return StreamingResponse(event_stream(), media_type="text/event-stream")

Ready to build your ai chatbot backend?

FastAPI AI Kit ships with everything shown above, pre-configured and production-ready. Clone the repo and start building in minutes.

Ready to ship your AI backend this weekend?

Join developers who skipped weeks of boilerplate and went straight to building.

Read the docs
No subscriptions · One-time payment · Lifetime updates