Build a Streaming AI Chatbot with Next.js 15 and Vercel AI SDK 4 (F...

Step-by-step: build a production-ready streaming AI chatbot with Vercel AI SDK 4 and Next.js 16 App Router. Covers real-time token streaming, persistent message history, error handling, and rate limiting.

Streaming AI responses token by token isn't just a nice UX touch — it's expected in 2026. Users who see a blank screen for 5 seconds while a model generates a response will assume your app is broken. The Vercel AI SDK makes streaming surprisingly straightforward with Next.js, but there are production pitfalls the docs don't cover. Let's build it right.

Project Setup

We'll use the Vercel AI SDK (v4) with Next.js 15 App Router and the Anthropic provider. The architecture is clean: a Server Action handles the AI call, and a client component renders the streaming response.

npx create-next-app@latest ai-chatbot --typescript --tailwind --app
cd ai-chatbot
npm install ai @ai-sdk/anthropic zod

The Chat API Route

Create app/api/chat/route.ts. The key insight is using streamText from the AI SDK — it returns a streaming response that the client SDK consumes automatically:

import { streamText } from "ai";
import { anthropic } from "@ai-sdk/anthropic";

export const dynamic = "force-dynamic";

export async function POST(req: Request) {
  const { messages } = await req.json();

  const result = streamText({
    model: anthropic("claude-sonnet-4-6-20250514"),
    system: "You are a helpful frontend development assistant. Answer concisely with code examples when relevant.",
    messages,
    maxTokens: 1024,
  });

  return result.toDataStreamResponse();
}

The Chat UI Component

The useChat hook from Vercel AI SDK handles message state, streaming, loading states, and error handling. Here's the client component:

"use client";

import { useChat } from "ai/react";
import { Send, Loader2 } from "lucide-react";

export function ChatInterface() {
  const { messages, input, handleInputChange, handleSubmit, isLoading, error } = useChat({
    api: "/api/chat",
  });

  return (
    <div className="flex flex-col h-[600px] max-w-2xl mx-auto border rounded-xl">
      <div className="flex-1 overflow-y-auto p-4 space-y-4">
        {messages.map((m) => (
          <div key={m.id} className={m.role === "user" ? "text-right" : "text-left"}>
            <div className={}>
              <p className="text-sm whitespace-pre-wrap">{m.content}</p>
            </div>
          </div>
        ))}
        {error && (
          <p className="text-sm text-destructive text-center">Something went wrong. Please try again.</p>
        )}
      </div>
      <form onSubmit={handleSubmit} className="flex gap-2 p-4 border-t">
        <input
          value={input}
          onChange={handleInputChange}
          placeholder="Ask about React, Next.js, TypeScript..."
          className="flex-1 px-4 py-2 border rounded-full text-sm focus:outline-none focus:ring-2"
          disabled={isLoading}
        />
        <button type="submit" disabled={isLoading || !input.trim()}
          className="p-2 rounded-full bg-primary text-primary-foreground disabled:opacity-50">
          {isLoading ? <Loader2 className="h-4 w-4 animate-spin" /> : <Send className="h-4 w-4" />}
        </button>
      </form>
    </div>
  );
}

Production Hardening

The basic setup works, but production needs more. Add rate limiting with a simple in-memory store or Redis. Handle token limits by truncating old messages. Add retry logic for transient API failures. The Vercel AI SDK's onError callback in useChat lets you show user-friendly error messages without breaking the chat flow.

Streaming Markdown Rendering

AI responses often include code blocks and formatting. Use react-markdown with remark-gfm to render streamed markdown in real time. The key trick is memoizing the markdown component to prevent re-renders on every token.

Key Takeaways

The Vercel AI SDK abstracts away streaming complexity — use streamText on the server and useChat on the client
Always add error boundaries and loading states for production chatbots
Truncate message history to stay within token limits on long conversations
Memoize markdown rendering to avoid performance issues during streaming
Rate limit your API route — AI API calls are expensive and easy to abuse

Project Setup

npx create-next-app@latest ai-chatbot --typescript --tailwind --app
cd ai-chatbot
npm install ai @ai-sdk/anthropic zod

The Chat API Route

Create app/api/chat/route.ts. The key insight is using streamText from the AI SDK — it returns a streaming response that the client SDK consumes automatically:

import { streamText } from "ai";
import { anthropic } from "@ai-sdk/anthropic";

export const dynamic = "force-dynamic";

export async function POST(req: Request) {
  const { messages } = await req.json();

  const result = streamText({
    model: anthropic("claude-sonnet-4-6-20250514"),
    system: "You are a helpful frontend development assistant. Answer concisely with code examples when relevant.",
    messages,
    maxTokens: 1024,
  });

  return result.toDataStreamResponse();
}

The Chat UI Component

The useChat hook from Vercel AI SDK handles message state, streaming, loading states, and error handling. Here's the client component:

"use client";

import { useChat } from "ai/react";
import { Send, Loader2 } from "lucide-react";

export function ChatInterface() {
  const { messages, input, handleInputChange, handleSubmit, isLoading, error } = useChat({
    api: "/api/chat",
  });

  return (
    <div className="flex flex-col h-[600px] max-w-2xl mx-auto border rounded-xl">
      <div className="flex-1 overflow-y-auto p-4 space-y-4">
        {messages.map((m) => (
          <div key={m.id} className={m.role === "user" ? "text-right" : "text-left"}>
            <div className={}>
              <p className="text-sm whitespace-pre-wrap">{m.content}</p>
            </div>
          </div>
        ))}
        {error && (
          <p className="text-sm text-destructive text-center">Something went wrong. Please try again.</p>
        )}
      </div>
      <form onSubmit={handleSubmit} className="flex gap-2 p-4 border-t">
        <input
          value={input}
          onChange={handleInputChange}
          placeholder="Ask about React, Next.js, TypeScript..."
          className="flex-1 px-4 py-2 border rounded-full text-sm focus:outline-none focus:ring-2"
          disabled={isLoading}
        />
        <button type="submit" disabled={isLoading || !input.trim()}
          className="p-2 rounded-full bg-primary text-primary-foreground disabled:opacity-50">
          {isLoading ? <Loader2 className="h-4 w-4 animate-spin" /> : <Send className="h-4 w-4" />}
        </button>
      </form>
    </div>
  );
}

Production Hardening

Streaming Markdown Rendering

Key Takeaways

The Vercel AI SDK abstracts away streaming complexity — use streamText on the server and useChat on the client
Always add error boundaries and loading states for production chatbots
Truncate message history to stay within token limits on long conversations
Memoize markdown rendering to avoid performance issues during streaming
Rate limit your API route — AI API calls are expensive and easy to abuse

Build a Streaming AI Chatbot with Vercel AI SDK 4 + Next.js 16 (2026 Tutorial)

Project Setup

The Chat API Route

The Chat UI Component

Production Hardening

Streaming Markdown Rendering

Key Takeaways

Related posts

AI CLI Coding Tools: 10 Reports Behind July 2026's Reset

OpenAI API to DeepSeek V4 Flash: When Switching Saves Money

Fable 5 vs Grok 4.5 for Coding: 7 Reports Aggregated (July 2026)

Build a Streaming AI Chatbot with Vercel AI SDK 4 + Next.js 16 (2026 Tutorial)

Project Setup

The Chat API Route

The Chat UI Component

Production Hardening

Streaming Markdown Rendering

Key Takeaways

Related posts

AI CLI Coding Tools: 10 Reports Behind July 2026's Reset

OpenAI API to DeepSeek V4 Flash: When Switching Saves Money

Fable 5 vs Grok 4.5 for Coding: 7 Reports Aggregated (July 2026)