AWS CloudShell offers users a complimentary Linux terminal environment with approximately 1GB of persistent storage available in each AWS region. One developer decided to explore an intriguing possibility: could multiple CloudShell instances across different regions be combined into a unified, fault-tolerant storage system?

The resulting project implements a clever distributed architecture. Data gets divided into segments and protected using Reed-Solomon erasure coding, which creates 6 data shards alongside 3 additional parity shards. All information is encrypted using AES-256-GCM before being distributed across different AWS regions. This setup provides strong redundancy—only 6 of the 9 total shards are needed to recover any file, meaning entire regions could fail without data loss.

Overcoming Technical Obstacles

The implementation required solving several non-trivial engineering challenges. CloudShell lacks a documented public API, so the developer reverse-engineered the underlying browser console interface to enable programmatic environment creation, session management, and periodic heartbeat signals. Additionally, CloudShell instances operate behind NAT with no inbound network access—a limitation addressed through a creative solution involving STUN protocol for endpoint discovery and UDP hole punching to establish peer connections. QUIC, a modern transport protocol, then runs across these punctured network holes to enable reliable communication.

A Python-based agent deployed to each CloudShell environment handles the actual shard storage and runs the necessary QUIC server components, completing the distributed system.

The complete implementation is available on GitHub at github.com/dan-v/cloudshell-store.

Source: Hacker News Show HN