Build a RAG Chatbot for Your Documentation With Next.js and OpenAI...

Retrieval Augmented Generation (RAG) lets an AI answer questions about your specific content. Build a docs chatbot that cites sources and stays grounded in your actual documentation.

What Is RAG and Why It Matters for Docs

RAG (Retrieval Augmented Generation) solves the hallucination problem: instead of asking an LLM to answer from training data alone, you first retrieve relevant chunks from your own documents, then ask the LLM to answer based on those chunks. The result is accurate, citable answers grounded in your actual content.

Architecture Overview

The pipeline has two phases:

Indexing: Parse docs → chunk text → generate embeddings → store in vector DB
Querying: Embed question → find similar chunks → send chunks + question to LLM → stream response

Step 1: Chunk and Embed Your Docs

// scripts/index-docs.ts
import { embedMany } from 'ai';
import { openai } from '@ai-sdk/openai';
import { db } from '../lib/db';
import { readFileSync, readdirSync } from 'fs';

function chunkText(text: string, chunkSize = 500, overlap = 50): string[] {
  const words = text.split(' ');
  const chunks: string[] = [];
  for (let i = 0; i < words.length; i += chunkSize - overlap) {
    chunks.push(words.slice(i, i + chunkSize).join(' '));
  }
  return chunks;
}

async function indexDocuments() {
  const files = readdirSync('./docs').filter(f => f.endsWith('.md'));
  
  for (const file of files) {
    const content = readFileSync(`./docs/${file}`, 'utf-8');
    const chunks = chunkText(content);
    
    const { embeddings } = await embedMany({
      model: openai.embedding('text-embedding-3-small'),
      values: chunks,
    });
    
    for (let i = 0; i < chunks.length; i++) {
      await db.$executeRaw`
        INSERT INTO doc_chunks (source, content, embedding)
        VALUES (${file}, ${chunks[i]}, ${JSON.stringify(embeddings[i])}::vector)
      `;
    }
    console.log(`Indexed ${file}: ${chunks.length} chunks`);
  }
}

indexDocuments();

Step 2: The RAG Query API

// app/api/docs-chat/route.ts
import { streamText, embed } from 'ai';
import { openai } from '@ai-sdk/openai';
import { db } from '@/lib/db';

export async function POST(request: Request) {
  const { messages } = await request.json();
  const latestQuestion = messages[messages.length - 1].content;

  // 1. Embed the question
  const { embedding } = await embed({
    model: openai.embedding('text-embedding-3-small'),
    value: latestQuestion,
  });

  // 2. Retrieve relevant chunks
  const chunks = await db.$queryRaw`
    SELECT source, content,
      1 - (embedding <=> ${JSON.stringify(embedding)}::vector) AS similarity
    FROM doc_chunks
    ORDER BY embedding <=> ${JSON.stringify(embedding)}::vector
    LIMIT 5
  `;

  const context = (chunks as any[])
    .map(c => `[${c.source}]\n${c.content}`)
    .join('\n\n---\n\n');

  // 3. Generate answer grounded in retrieved context
  const result = streamText({
    model: openai('gpt-4o-mini'),
    system: `You are a documentation assistant. Answer questions using ONLY the provided context.
    If the answer is not in the context, say so. Always cite which document you found the answer in.`,
    messages: [
      { role: 'user', content: `Context:\n${context}\n\nQuestion: ${latestQuestion}` }
    ],
  });

  return result.toDataStreamResponse();
}

Step 3: The Chat UI With Source Citations

'use client';
import { useChat } from 'ai/react';

export function DocsChat() {
  const { messages, input, handleInputChange, handleSubmit } = useChat({
    api: '/api/docs-chat',
  });

  return (
    <div className="flex flex-col h-screen max-w-2xl mx-auto p-4">
      <div className="flex-1 overflow-y-auto space-y-4">
        {messages.map(m => (
          <div key={m.id} className={m.role === 'user' ? 'text-right' : 'text-left'}>>
            <div className={`inline-block p-3 rounded-lg text-sm max-w-lg ${
              m.role === 'user' ? 'bg-blue-600 text-white' : 'bg-gray-100'
            }`}>
              {m.content}
            </div>
          </div>
        ))}
      </div>
      <form onSubmit={handleSubmit} className="mt-4 flex gap-2">
        <input
          value={input}
          onChange={handleInputChange}
          placeholder="Ask about the docs..."
          className="flex-1 border rounded p-2"
        />
        <button type="submit" className="px-4 py-2 bg-blue-600 text-white rounded">
          Ask
        </button>
      </form>
    </div>
  );
}

Production Considerations

Re-index automatically when docs are updated (webhook or CI step)
Cache embeddings to avoid re-embedding the same content
Add a confidence threshold — only show answers above 0.7 similarity
Log queries and low-confidence answers to identify documentation gaps

What Is RAG and Why It Matters for Docs

Architecture Overview

The pipeline has two phases:

Indexing: Parse docs → chunk text → generate embeddings → store in vector DB
Querying: Embed question → find similar chunks → send chunks + question to LLM → stream response

Step 1: Chunk and Embed Your Docs

// scripts/index-docs.ts
import { embedMany } from 'ai';
import { openai } from '@ai-sdk/openai';
import { db } from '../lib/db';
import { readFileSync, readdirSync } from 'fs';

function chunkText(text: string, chunkSize = 500, overlap = 50): string[] {
  const words = text.split(' ');
  const chunks: string[] = [];
  for (let i = 0; i < words.length; i += chunkSize - overlap) {
    chunks.push(words.slice(i, i + chunkSize).join(' '));
  }
  return chunks;
}

async function indexDocuments() {
  const files = readdirSync('./docs').filter(f => f.endsWith('.md'));
  
  for (const file of files) {
    const content = readFileSync(`./docs/${file}`, 'utf-8');
    const chunks = chunkText(content);
    
    const { embeddings } = await embedMany({
      model: openai.embedding('text-embedding-3-small'),
      values: chunks,
    });
    
    for (let i = 0; i < chunks.length; i++) {
      await db.$executeRaw`
        INSERT INTO doc_chunks (source, content, embedding)
        VALUES (${file}, ${chunks[i]}, ${JSON.stringify(embeddings[i])}::vector)
      `;
    }
    console.log(`Indexed ${file}: ${chunks.length} chunks`);
  }
}

indexDocuments();

Step 2: The RAG Query API

// app/api/docs-chat/route.ts
import { streamText, embed } from 'ai';
import { openai } from '@ai-sdk/openai';
import { db } from '@/lib/db';

export async function POST(request: Request) {
  const { messages } = await request.json();
  const latestQuestion = messages[messages.length - 1].content;

  // 1. Embed the question
  const { embedding } = await embed({
    model: openai.embedding('text-embedding-3-small'),
    value: latestQuestion,
  });

  // 2. Retrieve relevant chunks
  const chunks = await db.$queryRaw`
    SELECT source, content,
      1 - (embedding <=> ${JSON.stringify(embedding)}::vector) AS similarity
    FROM doc_chunks
    ORDER BY embedding <=> ${JSON.stringify(embedding)}::vector
    LIMIT 5
  `;

  const context = (chunks as any[])
    .map(c => `[${c.source}]\n${c.content}`)
    .join('\n\n---\n\n');

  // 3. Generate answer grounded in retrieved context
  const result = streamText({
    model: openai('gpt-4o-mini'),
    system: `You are a documentation assistant. Answer questions using ONLY the provided context.
    If the answer is not in the context, say so. Always cite which document you found the answer in.`,
    messages: [
      { role: 'user', content: `Context:\n${context}\n\nQuestion: ${latestQuestion}` }
    ],
  });

  return result.toDataStreamResponse();
}

Step 3: The Chat UI With Source Citations

'use client';
import { useChat } from 'ai/react';

export function DocsChat() {
  const { messages, input, handleInputChange, handleSubmit } = useChat({
    api: '/api/docs-chat',
  });

  return (
    <div className="flex flex-col h-screen max-w-2xl mx-auto p-4">
      <div className="flex-1 overflow-y-auto space-y-4">
        {messages.map(m => (
          <div key={m.id} className={m.role === 'user' ? 'text-right' : 'text-left'}>>
            <div className={`inline-block p-3 rounded-lg text-sm max-w-lg ${
              m.role === 'user' ? 'bg-blue-600 text-white' : 'bg-gray-100'
            }`}>
              {m.content}
            </div>
          </div>
        ))}
      </div>
      <form onSubmit={handleSubmit} className="mt-4 flex gap-2">
        <input
          value={input}
          onChange={handleInputChange}
          placeholder="Ask about the docs..."
          className="flex-1 border rounded p-2"
        />
        <button type="submit" className="px-4 py-2 bg-blue-600 text-white rounded">
          Ask
        </button>
      </form>
    </div>
  );
}

Production Considerations

Re-index automatically when docs are updated (webhook or CI step)
Cache embeddings to avoid re-embedding the same content
Add a confidence threshold — only show answers above 0.7 similarity
Log queries and low-confidence answers to identify documentation gaps

Build a RAG Chatbot for Your Documentation With Next.js and OpenAI

What Is RAG and Why It Matters for Docs

Architecture Overview

Step 1: Chunk and Embed Your Docs

Step 2: The RAG Query API

Step 3: The Chat UI With Source Citations

Production Considerations

Related posts

AI CLI Coding Tools: 10 Reports Behind July 2026's Reset

OpenAI API to DeepSeek V4 Flash: When Switching Saves Money

Fable 5 vs Grok 4.5 for Coding: 7 Reports Aggregated (July 2026)

Comments (0)

Build a RAG Chatbot for Your Documentation With Next.js and OpenAI

What Is RAG and Why It Matters for Docs

Architecture Overview

Step 1: Chunk and Embed Your Docs

Step 2: The RAG Query API

Step 3: The Chat UI With Source Citations

Production Considerations

Related posts

AI CLI Coding Tools: 10 Reports Behind July 2026's Reset

OpenAI API to DeepSeek V4 Flash: When Switching Saves Money

Fable 5 vs Grok 4.5 for Coding: 7 Reports Aggregated (July 2026)

Comments (0)