
gpt-image-2 API: ship 2K AI images in Next.js for $0.21 (2026)
OpenAI's new gpt-image-2 (Apr 21, 2026) ships 2K images with thinking mode and 8-image batches for ~$0.21 each — wire it into a Next.js server action.
What's new this week
OpenAI shipped ChatGPT Images 2.0 on April 21, 2026, exposing the new gpt-image-2 model in the API, Codex, and ChatGPT on the same day. The model renders up to 2,000 pixels on the long edge, supports seven aspect ratios from 3:1 to 1:3, and produces up to 8 coherent images per call with the same characters and objects preserved across the batch. A new thinking mode reasons about layout and typography before rendering — the reason gpt-image-2 now handles multilingual text, infographics, slides, and maps that gpt-image-1 used to mangle. TechCrunch called the text rendering "surprisingly good" and the Image Arena leaderboard currently ranks it #1 across every category. The production-tracked alias chatgpt-image-latest rolls updates forward automatically; pin to gpt-image-2 if you want a fixed version.
Why it matters for builders
Indie makers: you can skip the Midjourney → Photoshop dance for launch assets. Before: generate a square hero in Midjourney, hand-edit typography in Figma, upscale. After: one gpt-image-2 call returns an on-brand landscape hero with legible headline text at 2K — ready to paste into your marketing page. Eight-image batches turn A/B testing your hero copy into a single API call instead of eight prompt iterations.
Web engineers: product visuals no longer need a CMS upload flow. Before: designer exports PNG, uploads to S3, copy-pastes the URL into a CMS field. After: a Next.js server action takes the product title, calls images.generate, streams the base64 PNG straight into a next/image tag or Vercel Blob. You get on-demand blog covers, og:image defaults, and placeholder product photos from one endpoint.
AI engineers: demos that need synthetic screenshots or diagrams stop blocking on design tickets. Before: "let's Photoshop a fake dashboard for the pitch deck." After: one prompt — "a SaaS dashboard showing churn dropping from 8% to 3% over six months, labels in English and Vietnamese, dark theme" — returns a usable PNG in roughly 7 seconds. RAG and eval pipelines that need grounded visual artifacts can now generate them deterministically with a fixed seed.
Hands-on: try it in under 15 minutes
Requirements: Node 20+, the OpenAI Node SDK (npm i openai@^4), and an API key with image generation enabled. Drop this into a Next.js 16 server action at app/actions/image.ts:
"use server";
import OpenAI from "openai";
import { put } from "@vercel/blob";
const client = new OpenAI();
export async function generateCover(prompt: string) {
const res = await client.images.generate({
model: "gpt-image-2",
prompt,
size: "1536x1024", // landscape; up to 2K long-edge supported
quality: "high", // "low" | "medium" | "high"
n: 1, // bump to 8 for a coherent batch
// @ts-expect-error — new 2026 param, SDK types lag
thinking: "auto",
});
const b64 = res.data[0].b64_json!;
const { url } = await put(
`covers/${Date.now()}.png`,
Buffer.from(b64, "base64"),
{ access: "public", contentType: "image/png" },
);
return url;
}
Call it from an RSC page: const url = await generateCover("Dark hero for a Next.js tutorial, laptop with glowing keyboard, title 'Ship faster'");. Costs: OpenAI bills images as tokens — $5/M input text, $10/M output text, $8/M input image, $30/M output image. A 1024×1024 high-quality render lands at ~$0.21; a batch of four is ~$0.84. Thinking mode bills extra reasoning tokens, so a strict layout brief (four-column infographic, Vietnamese headings, exact pricing) costs more than a loose scene — budget it. Free-tier ChatGPT users only get instant mode; thinking, 8-image batches, and web-search grounding require Plus/Pro/Business or any paid API tier. For subject continuity across a batch — four angles of a product, a four-panel comic — set n: 8 and describe each variant inline; the model keeps subjects stable, which gpt-image-1 could not.
How it compares to alternatives
| gpt-image-2 | Gemini 2.5 Flash Image | Flux 1.1 Pro | |
|---|---|---|---|
| Starts at | ~$0.21 / 1024² high-quality render | $0.039 / image | $0.055 / image |
| Best for | Text-heavy infographics, slides, multilingual signage | Conversational edits, cheap iteration inside Gemini API | Photoreal hero shots, stylistic control |
| Key limit | 2K max on long edge; thinking mode billed extra | Weaker at small-font text rendering | No reasoning step; legibility weak on dense UI copy |
| Integration | openai SDK, one endpoint, base64 or URL response | @google/genai SDK, same call path as text | Replicate / Fal / BFL REST APIs |
Try it this week
Pick one piece of marketing art on your site — a blog cover, a pricing-page illustration, an empty-state screenshot — and regenerate it with gpt-image-2 in a Next.js server action tonight. Measure three numbers: total USD, first-render latency, and whether the text stays legible at 2×. If the answer is "cheaper than an hour of Figma," wire it into your publish pipeline as an auto-cover generator. For the audio side of the same UX pattern, see how Gemini 3.1 Flash TTS ships voice UX in 15 minutes; if you want the coding agent that now calls this endpoint natively, pair it with the OpenAI Codex April 2026 update.
Get weekly highlights
No spam, unsubscribe anytime.
DigitalOcean
Simple VPS & cloud hosting. $200 credit for new users over 60 days.
Railway
Deploy fullstack apps effortlessly. Postgres, Redis, Node in just a few clicks.



Comments (0)
Sign in to comment
No comments yet. Be the first to comment!