Building a RAG System from Scratch with Next.js: Vectors, Chunking,...

Learn how to build a production-grade Retrieval-Augmented Generation (RAG) system using Next.js App Router and TypeScript. This deep dive covers embedding models, chunking strategies, pgvector and Pinecone setup, semantic search, hybrid retrieval, and streaming LLM responses — everything you need to ship a real-world RAG app.

What Is RAG and Why Should Frontend Engineers Care?

Retrieval-Augmented Generation (RAG) is the architecture that closes the gap between a static large language model and your actual data. Instead of fine-tuning an expensive model on your documents, RAG lets you dynamically inject relevant context at inference time — retrieving the right chunks of text, stuffing them into the prompt, and letting the LLM reason on top of fresh, private information.

As a Next.js developer, you are in a uniquely powerful position: you already control the API layer, the database, and the UI. This tutorial walks you through building a production-grade RAG pipeline entirely within the Next.js App Router ecosystem — from ingesting documents to serving semantic search results to a chat interface.

The RAG Architecture at a Glance

Before writing a single line of code, understand the two distinct phases:

Ingestion (offline): Load documents → chunk them → embed each chunk → store vectors in a vector database.
Retrieval + Generation (online): Embed the user query → find the nearest chunks → build a prompt → stream the LLM response.

These phases are independent. You might ingest documents once a day via a cron job and serve thousands of queries per minute in real time. Keeping them decoupled is critical for scalability.

Choosing Your Embedding Model

An embedding model converts text into a dense numerical vector that captures semantic meaning. Similar sentences end up close together in vector space; dissimilar ones are far apart. Your choice of model affects quality, cost, and latency.

OpenAI text-embedding-3-small — 1536 dimensions, cheap (~$0.02/1M tokens), great baseline. Use this unless you have a specific reason not to.
OpenAI text-embedding-3-large — 3072 dimensions, higher accuracy for long-tail queries, 5× the cost.
Cohere embed-english-v3.0 — excellent multilingual support, returns float or compact int8 vectors.
Local models via Ollama (nomic-embed-text) — zero egress cost, runs on-prem, ideal for sensitive data.

For this tutorial we will use OpenAI's text-embedding-3-small. Install the SDK:

npm install openai @ai-sdk/openai ai

Chunking Strategies: The Overlooked Bottleneck

Chunking is where most RAG implementations quietly fail. If chunks are too large, you pay more per embedding and retrieve noisy context. If they are too small, you lose surrounding context that the LLM needs to answer accurately.

Fixed-Size Chunking with Overlap

The simplest strategy: split every N tokens, with an M-token overlap between consecutive chunks so that sentences are not cut in half.

// lib/chunker.ts
export interface Chunk {
  text: string;
  index: number;
  metadata: Record<string, unknown>;
}

export function chunkByTokens(
  text: string,
  chunkSize = 512,
  overlap = 64
): Chunk[] {
  // Approximate: 1 token ≈ 4 characters for English text
  const charChunk = chunkSize * 4;
  const charOverlap = overlap * 4;
  const chunks: Chunk[] = [];
  let start = 0;
  let index = 0;

  while (start < text.length) {
    const end = Math.min(start + charChunk, text.length);
    chunks.push({ text: text.slice(start, end), index: index++, metadata: {} });
    start += charChunk - charOverlap;
  }
  return chunks;
}

Recursive Character Splitting

A smarter approach: try to split on paragraph breaks first (\n\n), then sentence breaks (\n), then words. This preserves semantic units far better than raw character counts. The LangChain RecursiveCharacterTextSplitter implements exactly this logic — or you can roll your own in ~30 lines.

Document-Aware Chunking

For structured content (Markdown, HTML, code), use document-aware splitters that understand headers and code blocks. Keep an entire function body together rather than splitting it mid-loop. Libraries like llm-chunk or @langchain/textsplitters offer Markdown-aware splitters out of the box.

Rule of thumb: Start with 512-token chunks and 64-token overlap. Measure retrieval quality with a test set before optimizing further. Premature chunking optimization is a real trap.

Setting Up the Vector Database

You have two primary choices: a managed cloud service or a self-hosted Postgres extension.

Option A: pgvector (Self-Hosted / Supabase)

If you already run Postgres — or use Supabase — pgvector is the zero-friction choice. Enable it with a single migration:

-- migrations/001_enable_vector.sql
CREATE EXTENSION IF NOT EXISTS vector;

CREATE TABLE documents (
  id        uuid PRIMARY KEY DEFAULT gen_random_uuid(),
  content   text NOT NULL,
  embedding vector(1536),      -- matches text-embedding-3-small
  metadata  jsonb DEFAULT '{}'
);

CREATE INDEX ON documents
  USING ivfflat (embedding vector_cosine_ops)
  WITH (lists = 100);

The ivfflat index uses inverted file with flat compression. For fewer than 1M rows, hnsw (Hierarchical Navigable Small World) is faster at query time at the cost of a longer build:

CREATE INDEX ON documents
  USING hnsw (embedding vector_cosine_ops)
  WITH (m = 16, ef_construction = 64);

Option B: Pinecone (Managed)

Pinecone requires zero database management. Create an index in the dashboard (or via API), choose your dimension count, and start upserting vectors immediately. It scales to billions of vectors without you touching a single server.

npm install @pinecone-database/pinecone

// lib/pinecone.ts
import { Pinecone } from '@pinecone-database/pinecone';

export const pinecone = new Pinecone({
  apiKey: process.env.PINECONE_API_KEY!,
});

export const index = pinecone.index(process.env.PINECONE_INDEX_NAME!);

The Ingestion Pipeline: A Next.js Route Handler

Wire everything together in a single API route that accepts a document, chunks it, embeds it, and stores the vectors.

// app/api/ingest/route.ts
import { NextRequest, NextResponse } from 'next/server';
import OpenAI from 'openai';
import { chunkByTokens } from '@/lib/chunker';
import { db } from '@/lib/db'; // your Postgres client (e.g. drizzle/kysely)

const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

export async function POST(req: NextRequest) {
  const { content, metadata } = await req.json();

  // 1. Chunk the document
  const chunks = chunkByTokens(content, 512, 64);

  // 2. Embed all chunks in a single batched request
  const embeddingResponse = await openai.embeddings.create({
    model: 'text-embedding-3-small',
    input: chunks.map((c) => c.text),
  });

  // 3. Persist to pgvector
  const rows = chunks.map((chunk, i) => ({
    content: chunk.text,
    embedding: embeddingResponse.data[i].embedding,
    metadata: { ...metadata, chunkIndex: chunk.index },
  }));

  await db.transaction(async (tx) => {
    for (const row of rows) {
      await tx.execute(
        `INSERT INTO documents (content, embedding, metadata)
         VALUES ($1, $2::vector, $3)`,
        [row.content, JSON.stringify(row.embedding), JSON.stringify(row.metadata)]
      );
    }
  });

  return NextResponse.json({ chunksIngested: rows.length });
}

Batching is critical. The OpenAI embeddings API accepts up to 2048 inputs per request. Sending 100 chunks in one request is ~100× faster than 100 individual requests and avoids rate limit errors.

The Retrieval Pipeline: Semantic Search

At query time, embed the user's question and find the nearest document chunks by cosine similarity.

// lib/retrieve.ts
import OpenAI from 'openai';
import { db } from '@/lib/db';

const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

export async function retrieveRelevantChunks(
  query: string,
  topK = 5
): Promise<{ content: string; metadata: Record<string, unknown> }[]> {
  // 1. Embed the query
  const { data } = await openai.embeddings.create({
    model: 'text-embedding-3-small',
    input: query,
  });
  const queryVector = data[0].embedding;

  // 2. pgvector cosine similarity search
  const result = await db.execute<{ content: string; metadata: unknown }>(
    `SELECT content, metadata,
            1 - (embedding <=> $1::vector) AS similarity
     FROM documents
     ORDER BY embedding <=> $1::vector
     LIMIT $2`,
    [JSON.stringify(queryVector), topK]
  );

  return result.rows.map((r) => ({
    content: r.content,
    metadata: r.metadata as Record<string, unknown>,
  }));
}

The <=> operator is pgvector's cosine distance operator. It returns a value between 0 (identical) and 2 (opposite). Subtracting from 1 gives you similarity — but for the ORDER BY clause you just need ascending distance, so the subtraction is optional.

Streaming the RAG Response

Now connect retrieval to generation. Use the Vercel AI SDK's streamText for a first-class streaming experience in Next.js.

// app/api/chat/route.ts
import { streamText } from 'ai';
import { openai } from '@ai-sdk/openai';
import { NextRequest } from 'next/server';
import { retrieveRelevantChunks } from '@/lib/retrieve';

export const runtime = 'nodejs';

export async function POST(req: NextRequest) {
  const { messages } = await req.json();
  const lastMessage = messages[messages.length - 1].content as string;

  // 1. Retrieve relevant context
  const chunks = await retrieveRelevantChunks(lastMessage, 5);
  const context = chunks.map((c) => c.content).join('\n\n---\n\n');

  // 2. Inject context into the system prompt
  const systemPrompt = `You are a helpful assistant. Answer the user's question
based ONLY on the following context. If the context does not contain enough
information to answer, say so honestly.

Context:
${context}`;

  // 3. Stream the response
  const result = streamText({
    model: openai('gpt-4o-mini'),
    system: systemPrompt,
    messages,
  });

  return result.toDataStreamResponse();
}

On the client side, the Vercel AI SDK's useChat hook handles streaming automatically:

// app/chat/page.tsx
'use client';
import { useChat } from 'ai/react';

export default function ChatPage() {
  const { messages, input, handleInputChange, handleSubmit, isLoading } = useChat({
    api: '/api/chat',
  });

  return (
    <div className="flex flex-col h-screen max-w-2xl mx-auto p-4">
      <div className="flex-1 overflow-y-auto space-y-4">
        {messages.map((m) => (
          <div key={m.id} className={m.role === 'user' ? 'text-right' : 'text-left'}>
            <span className="inline-block bg-muted rounded-lg px-4 py-2">
              {m.content}
            </span>
          </div>
        ))}
      </div>
      <form onSubmit={handleSubmit} className="flex gap-2 mt-4">
        <input
          value={input}
          onChange={handleInputChange}
          placeholder="Ask anything..."
          className="flex-1 border rounded-lg px-4 py-2"
        />
        <button type="submit" disabled={isLoading}>Send</button>
      </form>
    </div>
  );
}

Advanced Retrieval Techniques

Hybrid Search (Keyword + Semantic)

Pure vector search struggles with exact matches — product codes, names, IDs. Combine it with full-text search using a Reciprocal Rank Fusion (RRF) merge:

-- Hybrid search with RRF
SELECT id, content,
       (1.0 / (60 + fts_rank)) + (1.0 / (60 + vec_rank)) AS rrf_score
FROM (
  SELECT id, content,
         ROW_NUMBER() OVER (ORDER BY ts_rank(to_tsvector('english', content), query) DESC) AS fts_rank,
         ROW_NUMBER() OVER (ORDER BY embedding <=> $1::vector) AS vec_rank
  FROM documents,
       plainto_tsquery('english', $2) query
) ranked
ORDER BY rrf_score DESC
LIMIT 5;

Re-ranking

Retrieve a larger candidate set (top-20) with fast vector search, then re-rank using a more accurate cross-encoder model (e.g., Cohere Rerank or a local BAAI/bge-reranker-base). This two-stage approach gives you the speed of ANN search with near-exact precision.

Metadata Filtering

Always store metadata (source URL, document type, date, author) alongside your vectors. Pre-filter by metadata before the ANN search to dramatically reduce the search space and avoid results from irrelevant sources:

SELECT content FROM documents
WHERE metadata->>'source' = 'docs.myapp.com'
  AND (metadata->>'updatedAt')::date > NOW() - INTERVAL '30 days'
ORDER BY embedding <=> $1::vector
LIMIT 5;

Production Checklist

Rate limiting: Wrap your ingestion endpoint with a job queue (BullMQ, Inngest) to avoid hammering the embeddings API.
Embedding cache: Cache embeddings for identical strings in Redis. Documents rarely change; re-embedding on every request wastes money.
Index maintenance: Run VACUUM ANALYZE documents periodically and rebuild the ivfflat/hnsw index after bulk ingestion to keep query performance optimal.
Observability: Log query, retrieved chunks, and final LLM answer to a traces table. Use LangSmith or Langfuse to spot retrieval failures.
Chunking evaluation: Build a small golden dataset of query→expected-chunk pairs and measure recall@5 before shipping to production.

Conclusion

RAG is not magic — it is disciplined engineering. The quality of your system depends far more on how you chunk and retrieve than on which LLM you call at the end. Start simple: fixed-size chunks, text-embedding-3-small, pgvector on Supabase. Ship it. Measure retrieval quality against real queries. Then layer in hybrid search, re-ranking, and metadata filtering as your use case demands.

Next.js App Router gives you the perfect foundation: server components for data-heavy ingestion UIs, Route Handlers for streaming API endpoints, and first-class TypeScript throughout. The stack is approachable, the pieces are composable, and — with the patterns above — you can go from zero to a production RAG system in a weekend.

What Is RAG and Why Should Frontend Engineers Care?

The RAG Architecture at a Glance

Before writing a single line of code, understand the two distinct phases:

Ingestion (offline): Load documents → chunk them → embed each chunk → store vectors in a vector database.
Retrieval + Generation (online): Embed the user query → find the nearest chunks → build a prompt → stream the LLM response.

These phases are independent. You might ingest documents once a day via a cron job and serve thousands of queries per minute in real time. Keeping them decoupled is critical for scalability.

Choosing Your Embedding Model

OpenAI text-embedding-3-small — 1536 dimensions, cheap (~$0.02/1M tokens), great baseline. Use this unless you have a specific reason not to.
OpenAI text-embedding-3-large — 3072 dimensions, higher accuracy for long-tail queries, 5× the cost.
Cohere embed-english-v3.0 — excellent multilingual support, returns float or compact int8 vectors.
Local models via Ollama (nomic-embed-text) — zero egress cost, runs on-prem, ideal for sensitive data.

For this tutorial we will use OpenAI's text-embedding-3-small. Install the SDK:

npm install openai @ai-sdk/openai ai

Chunking Strategies: The Overlooked Bottleneck

Fixed-Size Chunking with Overlap

The simplest strategy: split every N tokens, with an M-token overlap between consecutive chunks so that sentences are not cut in half.

// lib/chunker.ts
export interface Chunk {
  text: string;
  index: number;
  metadata: Record<string, unknown>;
}

export function chunkByTokens(
  text: string,
  chunkSize = 512,
  overlap = 64
): Chunk[] {
  // Approximate: 1 token ≈ 4 characters for English text
  const charChunk = chunkSize * 4;
  const charOverlap = overlap * 4;
  const chunks: Chunk[] = [];
  let start = 0;
  let index = 0;

  while (start < text.length) {
    const end = Math.min(start + charChunk, text.length);
    chunks.push({ text: text.slice(start, end), index: index++, metadata: {} });
    start += charChunk - charOverlap;
  }
  return chunks;
}

Recursive Character Splitting

Document-Aware Chunking

Rule of thumb: Start with 512-token chunks and 64-token overlap. Measure retrieval quality with a test set before optimizing further. Premature chunking optimization is a real trap.

Setting Up the Vector Database

You have two primary choices: a managed cloud service or a self-hosted Postgres extension.

Option A: pgvector (Self-Hosted / Supabase)

If you already run Postgres — or use Supabase — pgvector is the zero-friction choice. Enable it with a single migration:

-- migrations/001_enable_vector.sql
CREATE EXTENSION IF NOT EXISTS vector;

CREATE TABLE documents (
  id        uuid PRIMARY KEY DEFAULT gen_random_uuid(),
  content   text NOT NULL,
  embedding vector(1536),      -- matches text-embedding-3-small
  metadata  jsonb DEFAULT '{}'
);

CREATE INDEX ON documents
  USING ivfflat (embedding vector_cosine_ops)
  WITH (lists = 100);

The ivfflat index uses inverted file with flat compression. For fewer than 1M rows, hnsw (Hierarchical Navigable Small World) is faster at query time at the cost of a longer build:

CREATE INDEX ON documents
  USING hnsw (embedding vector_cosine_ops)
  WITH (m = 16, ef_construction = 64);

Option B: Pinecone (Managed)

npm install @pinecone-database/pinecone

// lib/pinecone.ts
import { Pinecone } from '@pinecone-database/pinecone';

export const pinecone = new Pinecone({
  apiKey: process.env.PINECONE_API_KEY!,
});

export const index = pinecone.index(process.env.PINECONE_INDEX_NAME!);

The Ingestion Pipeline: A Next.js Route Handler

Wire everything together in a single API route that accepts a document, chunks it, embeds it, and stores the vectors.

// app/api/ingest/route.ts
import { NextRequest, NextResponse } from 'next/server';
import OpenAI from 'openai';
import { chunkByTokens } from '@/lib/chunker';
import { db } from '@/lib/db'; // your Postgres client (e.g. drizzle/kysely)

const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

export async function POST(req: NextRequest) {
  const { content, metadata } = await req.json();

  // 1. Chunk the document
  const chunks = chunkByTokens(content, 512, 64);

  // 2. Embed all chunks in a single batched request
  const embeddingResponse = await openai.embeddings.create({
    model: 'text-embedding-3-small',
    input: chunks.map((c) => c.text),
  });

  // 3. Persist to pgvector
  const rows = chunks.map((chunk, i) => ({
    content: chunk.text,
    embedding: embeddingResponse.data[i].embedding,
    metadata: { ...metadata, chunkIndex: chunk.index },
  }));

  await db.transaction(async (tx) => {
    for (const row of rows) {
      await tx.execute(
        `INSERT INTO documents (content, embedding, metadata)
         VALUES ($1, $2::vector, $3)`,
        [row.content, JSON.stringify(row.embedding), JSON.stringify(row.metadata)]
      );
    }
  });

  return NextResponse.json({ chunksIngested: rows.length });
}

The Retrieval Pipeline: Semantic Search

At query time, embed the user's question and find the nearest document chunks by cosine similarity.

// lib/retrieve.ts
import OpenAI from 'openai';
import { db } from '@/lib/db';

const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });

export async function retrieveRelevantChunks(
  query: string,
  topK = 5
): Promise<{ content: string; metadata: Record<string, unknown> }[]> {
  // 1. Embed the query
  const { data } = await openai.embeddings.create({
    model: 'text-embedding-3-small',
    input: query,
  });
  const queryVector = data[0].embedding;

  // 2. pgvector cosine similarity search
  const result = await db.execute<{ content: string; metadata: unknown }>(
    `SELECT content, metadata,
            1 - (embedding <=> $1::vector) AS similarity
     FROM documents
     ORDER BY embedding <=> $1::vector
     LIMIT $2`,
    [JSON.stringify(queryVector), topK]
  );

  return result.rows.map((r) => ({
    content: r.content,
    metadata: r.metadata as Record<string, unknown>,
  }));
}

Streaming the RAG Response

Now connect retrieval to generation. Use the Vercel AI SDK's streamText for a first-class streaming experience in Next.js.

// app/api/chat/route.ts
import { streamText } from 'ai';
import { openai } from '@ai-sdk/openai';
import { NextRequest } from 'next/server';
import { retrieveRelevantChunks } from '@/lib/retrieve';

export const runtime = 'nodejs';

export async function POST(req: NextRequest) {
  const { messages } = await req.json();
  const lastMessage = messages[messages.length - 1].content as string;

  // 1. Retrieve relevant context
  const chunks = await retrieveRelevantChunks(lastMessage, 5);
  const context = chunks.map((c) => c.content).join('\n\n---\n\n');

  // 2. Inject context into the system prompt
  const systemPrompt = `You are a helpful assistant. Answer the user's question
based ONLY on the following context. If the context does not contain enough
information to answer, say so honestly.

Context:
${context}`;

  // 3. Stream the response
  const result = streamText({
    model: openai('gpt-4o-mini'),
    system: systemPrompt,
    messages,
  });

  return result.toDataStreamResponse();
}

On the client side, the Vercel AI SDK's useChat hook handles streaming automatically:

// app/chat/page.tsx
'use client';
import { useChat } from 'ai/react';

export default function ChatPage() {
  const { messages, input, handleInputChange, handleSubmit, isLoading } = useChat({
    api: '/api/chat',
  });

  return (
    <div className="flex flex-col h-screen max-w-2xl mx-auto p-4">
      <div className="flex-1 overflow-y-auto space-y-4">
        {messages.map((m) => (
          <div key={m.id} className={m.role === 'user' ? 'text-right' : 'text-left'}>
            <span className="inline-block bg-muted rounded-lg px-4 py-2">
              {m.content}
            </span>
          </div>
        ))}
      </div>
      <form onSubmit={handleSubmit} className="flex gap-2 mt-4">
        <input
          value={input}
          onChange={handleInputChange}
          placeholder="Ask anything..."
          className="flex-1 border rounded-lg px-4 py-2"
        />
        <button type="submit" disabled={isLoading}>Send</button>
      </form>
    </div>
  );
}

Advanced Retrieval Techniques

Hybrid Search (Keyword + Semantic)

Pure vector search struggles with exact matches — product codes, names, IDs. Combine it with full-text search using a Reciprocal Rank Fusion (RRF) merge:

-- Hybrid search with RRF
SELECT id, content,
       (1.0 / (60 + fts_rank)) + (1.0 / (60 + vec_rank)) AS rrf_score
FROM (
  SELECT id, content,
         ROW_NUMBER() OVER (ORDER BY ts_rank(to_tsvector('english', content), query) DESC) AS fts_rank,
         ROW_NUMBER() OVER (ORDER BY embedding <=> $1::vector) AS vec_rank
  FROM documents,
       plainto_tsquery('english', $2) query
) ranked
ORDER BY rrf_score DESC
LIMIT 5;

Re-ranking

Metadata Filtering

SELECT content FROM documents
WHERE metadata->>'source' = 'docs.myapp.com'
  AND (metadata->>'updatedAt')::date > NOW() - INTERVAL '30 days'
ORDER BY embedding <=> $1::vector
LIMIT 5;

Production Checklist

Rate limiting: Wrap your ingestion endpoint with a job queue (BullMQ, Inngest) to avoid hammering the embeddings API.
Embedding cache: Cache embeddings for identical strings in Redis. Documents rarely change; re-embedding on every request wastes money.
Index maintenance: Run VACUUM ANALYZE documents periodically and rebuild the ivfflat/hnsw index after bulk ingestion to keep query performance optimal.
Observability: Log query, retrieved chunks, and final LLM answer to a traces table. Use LangSmith or Langfuse to spot retrieval failures.
Chunking evaluation: Build a small golden dataset of query→expected-chunk pairs and measure recall@5 before shipping to production.

Building a RAG System from Scratch with Next.js: Vectors, Chunking, and Real-World Retrieval

What Is RAG and Why Should Frontend Engineers Care?

The RAG Architecture at a Glance

Choosing Your Embedding Model

Chunking Strategies: The Overlooked Bottleneck

Fixed-Size Chunking with Overlap

Recursive Character Splitting

Document-Aware Chunking

Setting Up the Vector Database

Option A: pgvector (Self-Hosted / Supabase)

Option B: Pinecone (Managed)

The Ingestion Pipeline: A Next.js Route Handler

The Retrieval Pipeline: Semantic Search

Streaming the RAG Response

Advanced Retrieval Techniques

Hybrid Search (Keyword + Semantic)

Re-ranking

Metadata Filtering

Production Checklist

Conclusion

Related Articles

Related posts

AI CLI Coding Tools: 10 Reports Behind July 2026's Reset

OpenAI API to DeepSeek V4 Flash: When Switching Saves Money

Fable 5 vs Grok 4.5 for Coding: 7 Reports Aggregated (July 2026)

Comments (0)

Building a RAG System from Scratch with Next.js: Vectors, Chunking, and Real-World Retrieval

What Is RAG and Why Should Frontend Engineers Care?

The RAG Architecture at a Glance

Choosing Your Embedding Model

Chunking Strategies: The Overlooked Bottleneck

Fixed-Size Chunking with Overlap

Recursive Character Splitting

Document-Aware Chunking

Setting Up the Vector Database

Option A: pgvector (Self-Hosted / Supabase)

Option B: Pinecone (Managed)

The Ingestion Pipeline: A Next.js Route Handler

The Retrieval Pipeline: Semantic Search

Streaming the RAG Response

Advanced Retrieval Techniques

Hybrid Search (Keyword + Semantic)

Re-ranking

Metadata Filtering

Production Checklist

Conclusion

Related Articles

Related posts

AI CLI Coding Tools: 10 Reports Behind July 2026's Reset

OpenAI API to DeepSeek V4 Flash: When Switching Saves Money

Fable 5 vs Grok 4.5 for Coding: 7 Reports Aggregated (July 2026)

Comments (0)