Quick Answer

Use any OpenAI-compatible API (OpenAI, Claude, Assisters) with the openai npm package. Stream responses via Server-Sent Events, store conversation history in Postgres, and add function calling for tool use.

Streaming feels 5x faster even at the same latency
Store every message for debugging and fine-tuning
Rate-limit per user to prevent abuse

What You'll Need

Next.js 15+ app or any Node backend
OpenAI-compatible API key (Assisters recommended for self-hosted)
Postgres or Supabase for history
Vercel AI SDK or raw openai client

Steps

Install dependencies. pnpm add openai ai @ai-sdk/openai
Configure client.

   import OpenAI from 'openai';
   const ai = new OpenAI({
     baseURL: 'https://assisters.dev/api/v1',
     apiKey: process.env.ASSISTERS_API_KEY!,
   });

Create streaming endpoint. In app/api/chat/route.ts:

   const stream = await ai.chat.completions.create({
     model: 'assisters-chat-v1',
     messages,
     stream: true,
   });
   return new Response(stream.toReadableStream());

Build the UI. Use Vercel AI SDK's useChat hook.
Persist messages. On each exchange, insert into messages table with conversation_id.
Add function calling. Define tools (search DB, call API). AI decides when to invoke.
Moderate input and output. Call /moderate endpoint before responding.
Rate limit. @upstash/ratelimit or self-hosted Redis: 20 msg/min per user.

Common Mistakes

Skipping moderation. A single jailbreak screenshot destroys trust.
Infinite context. Truncate history to last 20 messages + summary of older.
No retry logic. Network blips kill UX. Use exponential backoff.
Exposing API key in client. Always proxy through your server.

Top Tools

Tool	Use
Vercel AI SDK	Chat UI primitives
Assisters	OpenAI-compatible gateway
Supabase	History + auth
Langfuse	Observability
Upstash / Redis	Rate limiting

Conclusion

A production chatbot is a weekend project in 2026 with OpenAI-compatible APIs and the Vercel AI SDK. Self-host the model gateway (Assisters) to control costs and data. Try Misar Dev to generate the entire scaffold from a prompt.