ContextD is a lightweight macOS application that monitors your screen in real-time, extracts text from what you're viewing through optical character recognition, and makes that contextual information available to AI language models via a local API. The entire workflow happens on your machine, with data never leaving your computer except for API calls to OpenRouter.

How It Works

The application takes a methodical approach to capturing and organizing screen activity. Every two seconds, ContextD snapshots your display and compares it to the previous capture. Rather than processing the entire screen, it uses SIMD-accelerated pixel diffing to identify what's changed. The system then runs OCR exclusively on those modified regions, storing the extracted text in a local SQLite database. Screenshots themselves are processed in memory and immediately discarded—no images are retained on disk.

A background process continuously summarizes your activity using Claude Haiku via OpenRouter, keeping costs minimal (roughly $2 per day). This summary data becomes queryable through a local HTTP API, enabling other applications and AI agents to understand what you've been working on without storing raw visual data.

Getting Started

Installation requires macOS 14 or later and Swift 5.9. After cloning the repository and running make build, the application needs to be bundled into an app package to trigger proper macOS permission dialogs. Once launched, you'll grant access to screen recording and accessibility features, then enter your OpenRouter API credentials in settings.

ContextD serves an interactive API on localhost at port 21890, complete with Swagger documentation. You can perform full-text searches across activity summaries, retrieve recent activity spanning specific time windows, or browse captures around particular timestamps.

Enriching Prompts with Context

A standout feature is the prompt enrichment workflow. Press Cmd+Shift+Space to open the enrichment panel, paste your question or request, specify a lookback window, and ContextD automatically appends relevant context from your recent activity. The enriched prompt includes footnoted references to specific moments—showing exactly when and where relevant information appeared on your screen. This lets you feed AI assistants highly contextual requests without manually copying and pasting details.

Customization and Control

The settings panel allows you to adjust the API key, capture frequency, which models to use for summarization versus enrichment, token limits, and data retention policies. The capture pipeline is configurable: the system decides whether each frame is a full keyframe (capturing the entire screen) or a delta update (capturing only changed regions) based on the percentage of pixels that shifted. Developers can inspect the database directly using provided make targets to review statistics, recent captures, and search results.

Source: Hacker News Show HN