
Build a RAG Chatbot for Your Documentation With Next.js and OpenAI
Retrieval Augmented Generation (RAG) lets an AI answer questions about your specific content. Build a docs chatbot that cites sources and stays grounded in your actual documentation.
What Is RAG and Why It Matters for Docs
RAG (Retrieval Augmented Generation) solves the hallucination problem: instead of asking an LLM to answer from training data alone, you first retrieve relevant chunks from your own documents, then ask the LLM to answer based on those chunks. The result is accurate, citable answers grounded in your actual content.
Architecture Overview
The pipeline has two phases:
- Indexing: Parse docs → chunk text → generate embeddings → store in vector DB
- Querying: Embed question → find similar chunks → send chunks + question to LLM → stream response
Step 1: Chunk and Embed Your Docs
// scripts/index-docs.ts
import { embedMany } from 'ai';
import { openai } from '@ai-sdk/openai';
import { db } from '../lib/db';
import { readFileSync, readdirSync } from 'fs';
function chunkText(text: string, chunkSize = 500, overlap = 50): string[] {
const words = text.split(' ');
const chunks: string[] = [];
for (let i = 0; i < words.length; i += chunkSize - overlap) {
chunks.push(words.slice(i, i + chunkSize).join(' '));
}
return chunks;
}
async function indexDocuments() {
const files = readdirSync('./docs').filter(f => f.endsWith('.md'));
for (const file of files) {
const content = readFileSync(`./docs/${file}`, 'utf-8');
const chunks = chunkText(content);
const { embeddings } = await embedMany({
model: openai.embedding('text-embedding-3-small'),
values: chunks,
});
for (let i = 0; i < chunks.length; i++) {
await db.$executeRaw`
INSERT INTO doc_chunks (source, content, embedding)
VALUES (${file}, ${chunks[i]}, ${JSON.stringify(embeddings[i])}::vector)
`;
}
console.log(`Indexed ${file}: ${chunks.length} chunks`);
}
}
indexDocuments();
Step 2: The RAG Query API
// app/api/docs-chat/route.ts
import { streamText, embed } from 'ai';
import { openai } from '@ai-sdk/openai';
import { db } from '@/lib/db';
export async function POST(request: Request) {
const { messages } = await request.json();
const latestQuestion = messages[messages.length - 1].content;
// 1. Embed the question
const { embedding } = await embed({
model: openai.embedding('text-embedding-3-small'),
value: latestQuestion,
});
// 2. Retrieve relevant chunks
const chunks = await db.$queryRaw`
SELECT source, content,
1 - (embedding <=> ${JSON.stringify(embedding)}::vector) AS similarity
FROM doc_chunks
ORDER BY embedding <=> ${JSON.stringify(embedding)}::vector
LIMIT 5
`;
const context = (chunks as any[])
.map(c => `[${c.source}]\n${c.content}`)
.join('\n\n---\n\n');
// 3. Generate answer grounded in retrieved context
const result = streamText({
model: openai('gpt-4o-mini'),
system: `You are a documentation assistant. Answer questions using ONLY the provided context.
If the answer is not in the context, say so. Always cite which document you found the answer in.`,
messages: [
{ role: 'user', content: `Context:\n${context}\n\nQuestion: ${latestQuestion}` }
],
});
return result.toDataStreamResponse();
}
Step 3: The Chat UI With Source Citations
'use client';
import { useChat } from 'ai/react';
export function DocsChat() {
const { messages, input, handleInputChange, handleSubmit } = useChat({
api: '/api/docs-chat',
});
return (
<div className="flex flex-col h-screen max-w-2xl mx-auto p-4">
<div className="flex-1 overflow-y-auto space-y-4">
{messages.map(m => (
<div key={m.id} className={m.role === 'user' ? 'text-right' : 'text-left'}>>
<div className={`inline-block p-3 rounded-lg text-sm max-w-lg ${
m.role === 'user' ? 'bg-blue-600 text-white' : 'bg-gray-100'
}`}>
{m.content}
</div>
</div>
))}
</div>
<form onSubmit={handleSubmit} className="mt-4 flex gap-2">
<input
value={input}
onChange={handleInputChange}
placeholder="Ask about the docs..."
className="flex-1 border rounded p-2"
/>
<button type="submit" className="px-4 py-2 bg-blue-600 text-white rounded">
Ask
</button>
</form>
</div>
);
}
Production Considerations
- Re-index automatically when docs are updated (webhook or CI step)
- Cache embeddings to avoid re-embedding the same content
- Add a confidence threshold — only show answers above 0.7 similarity
- Log queries and low-confidence answers to identify documentation gaps
Get weekly highlights
No spam, unsubscribe anytime.
LittleBird
AI-powered deep research & outreach automation — find leads, analyze markets, and write personalized emails at scale.
DigitalOcean
Simple VPS & cloud hosting. $200 credit for new users over 60 days.



Comments (0)
Sign in to comment
No comments yet. Be the first to comment!