Amazon S3 at 20: How a Simple Storage Service Became the Silent Engine of Modern Data

In March 2006, Amazon announced a new service with little fanfare. A single paragraph on a 'What's New' page introduced Amazon Simple Storage Service, promising 'storage for the Internet.' The launch was so understated that the lead blog post was written hastily before a flight. Few could have predicted this service would fundamentally rewire how the world stores and uses data.

Two decades later, S3's original premise—simple PUT and GET commands—has scaled to almost incomprehensible levels. It now holds over 500 trillion objects across hundreds of exabytes. The cost, however, has moved in the opposite direction, dropping roughly 85% from its original 15 cents per gigabyte. The most telling statistic? Code written for S3 in 2006 still runs without modification today, a testament to relentless backward compatibility amid total internal reinvention.

For data and machine learning engineers, S3's evolution is particularly significant. It has shifted from a passive repository to an active data foundation. Recent additions like S3 Tables for managed Apache Iceberg, S3 Vectors for semantic search, and S3 Metadata for instant discovery are reshaping architecture. These features allow teams to work with data directly in storage, avoiding costly movement and duplication between specialized systems. The vector indexing capability alone saw 40 billion vectors ingested in its first five months after a 2025 launch.

Underpinning this is engineering rigor suited to planetary scale. Critical performance code is progressively rewritten in Rust for safety and speed. Formal methods mathematically prove system correctness. A fleet of microservices continuously audits every stored byte, triggering repairs at the first sign of degradation. The design philosophy turns scale into an asset: as the system grows, workloads become more decorrelated, improving reliability for all users.

As S3 enters its third decade, its role is crystallizing: to be the single, universal layer for all data and AI workloads. The goal is to store data once and enable every possible use case from that location. For engineers building the next generation of models and applications, this isn't just about cheaper storage. It's about a foundational layer that removes friction, letting them focus on what they build rather than how they manage the bits.

Source: AWS