Streaming HTML with React Suspense: Reduce TTFB by 40%

React Suspense with streaming SSR lets you send HTML progressively instead of waiting for all data. Here's how to implement it in Next.js for dramatically faster page loads.

Traditional server-side rendering has a fundamental bottleneck: the server must fetch all data before sending any HTML. If your page has a fast header query (5ms) and a slow recommendations query (800ms), the user waits 800ms for everything — including the header they could have seen 795ms earlier.

Streaming HTML with React Suspense eliminates this bottleneck. The server sends HTML progressively as data becomes available. The header renders immediately. The recommendations stream in when ready. Time to First Byte drops dramatically.

How Streaming Works in Next.js

In the App Router, streaming is built into Server Components. When you wrap a component in <Suspense>, Next.js streams the fallback immediately and replaces it with the real content when the async operation completes.

// app/blog/[slug]/page.tsx
import { Suspense } from "react";
import { ArticleContent } from "@/components/blog/article-content";
import { CommentSection } from "@/components/blog/comment-section";
import { RelatedPosts } from "@/components/blog/related-posts";
import { CommentSkeleton, RelatedSkeleton } from "@/components/skeletons";

export default async function BlogPostPage({
  params,
}: {
  params: Promise<{ slug: string }>;
}) {
  const { slug } = await params;

  return (
    <article>
      {/* This streams first — fast DB query */}
      <Suspense fallback={<ArticleSkeleton />}>
        <ArticleContent slug={slug} />
      </Suspense>

      {/* Comments stream independently — may be slow */}
      <Suspense fallback={<CommentSkeleton />}>
        <CommentSection slug={slug} />
      </Suspense>

      {/* Related posts stream independently */}
      <Suspense fallback={<RelatedSkeleton />}>
        <RelatedPosts slug={slug} />
      </Suspense>
    </article>
  );
}

Each <Suspense> boundary is an independent streaming unit. The server sends the shell and fast sections first, then progressively fills in slower sections. The user sees content appearing in stages instead of a blank screen followed by everything at once.

The loading.tsx Convention

Next.js provides a file-based way to add Suspense boundaries at the route level. Creating a loading.tsx file wraps the entire page in Suspense automatically:

// app/blog/[slug]/loading.tsx
export default function BlogPostLoading() {
  return (
    <div className="max-w-3xl mx-auto px-4 py-8">
      {/* Skeleton matches the layout of the actual page */}
      <div className="h-8 w-3/4 bg-muted animate-pulse rounded mb-4" />
      <div className="flex items-center gap-3 mb-8">
        <div className="h-10 w-10 bg-muted animate-pulse rounded-full" />
        <div className="h-4 w-32 bg-muted animate-pulse rounded" />
      </div>
      <div className="space-y-3">
        {Array.from({ length: 8 }).map((_, i) => (
          <div key={i} className="h-4 bg-muted animate-pulse rounded"
            style={{ width:  }} />
        ))}
      </div>
    </div>
  );
}

But for optimal streaming, use granular <Suspense> boundaries inside the page instead of a single loading.tsx. This way, fast sections render immediately rather than waiting behind a page-level skeleton.

Measuring the Impact

The metrics that improve with streaming:

Time to First Byte (TTFB): The server sends the initial HTML shell immediately instead of waiting for all data. We measured a 40% TTFB reduction on pages with mixed-speed data sources.
First Contentful Paint (FCP): The browser can start rendering the shell and fast components while slower ones are still loading.
Largest Contentful Paint (LCP): If your LCP element (hero image, main heading) is in a fast data path, it renders much earlier.

When Not to Stream

Streaming adds complexity. Skip it when:

All data sources are fast (<50ms) — the overhead isn't worth it
The page is fully static — use ISR or static generation instead
SEO requires complete HTML on initial response — most crawlers handle streaming now, but verify for your target search engines

Advanced: Parallel Data Fetching

Combine streaming with parallel data fetching for maximum performance. Instead of sequential awaits, kick off all fetches simultaneously and let Suspense handle the resolution order:

// Each component fetches its own data independently
// They resolve in parallel, not sequentially

async function ArticleContent({ slug }: { slug: string }) {
  const article = await getArticleBySlug(slug); // 20ms
  return <div>{article.content}</div>;
}

async function CommentSection({ slug }: { slug: string }) {
  const comments = await getCommentsBySlug(slug); // 200ms
  return <div>{comments.map(c => <Comment key={c.id} {...c} />)}</div>;
}

async function RelatedPosts({ slug }: { slug: string }) {
  const related = await getRelatedPosts(slug); // 150ms
  return <div>{related.map(p => <PostCard key={p.id} {...p} />)}</div>;
}

// Total time: ~200ms (parallel) instead of 370ms (sequential)

Takeaways

Wrap slow data-fetching components in <Suspense> for progressive streaming
Use granular Suspense boundaries instead of page-level loading.tsx for optimal UX
Design skeletons that match actual layout to prevent layout shift
Combine streaming with parallel data fetching for maximum throughput
Measure TTFB and FCP to quantify the improvement — expect 30–50% gains on data-heavy pages

How Streaming Works in Next.js

The loading.tsx Convention

Measuring the Impact

When Not to Stream

Advanced: Parallel Data Fetching

Takeaways

Bài viết liên quan

Edge Runtime in Next.js: When to Use It and When to Avoid It

CSS Animation Performance: Stick to These Properties to Avoid Jank

How I Automated My Frontend Code Review with Claude Code