Skip to main content

Amazon S3

Amazon S3 Files

The first and only cloud object store that provides fully-featured, high-performance file system access to your data

What is S3 Files?

S3 Files is a shared file system that connects any AWS compute directly with your data in Amazon S3. It provides fast, direct access to all of your S3 data as files with full file system semantics and low-latency performance, without your data ever leaving S3. That means file-based applications, agents, and teams can now access and work with S3 data as a file system using the tools they already depend on. You no longer need to duplicate your data or cycle it between object storage and file system storage. Now, file-based tools and applications across your organization can work with your S3 data directly from any compute instance, container, and function using the tools your teams and agents already depend on.

With S3 Files, Amazon S3 is the first and only cloud object store that provides fully-featured, high-performance file system access to your data. S3 Files gives you the performance and simplicity of a file system with the scalability, durability, and cost-effectiveness of S3. There are no data silos, no synchronization complexities, and no tradeoffs. File and object storage, together in one place without compromise. 

Benefits

    Your applications, agents, and users expect to work with data as files and folders. With S3 Files, everything that needs to work with your S3 data as files can do so directly. Python libraries, ML frameworks, CLI utilities, and shell scripts all work with your S3 data with no custom connectors and no new APIs to learn.

    S3 Files delivers low latency and up to multiple terabytes per second of aggregate read throughput. Performance is optimized specifically for file workloads through intelligent caching, minimizing latency and maximizing throughput. S3 Files intelligently loads your active working set onto high-performance storage, delivering low-latency access while keeping costs proportional to what you're actively using.

    Organizations duplicate data across storage systems when workflows require both object and file access. S3 Files eliminates duplicate storage and the pipelines required to keep separate systems in sync. Your data lives in S3 and stays in S3, accessible through both file and object interfaces simultaneously. One copy, one place to manage, zero synchronization overhead.

    S3 Files is built on S3's pay-as-you-go model with no provisioned capacity and no minimum commitments. Your workload scales automatically within S3 for high-performance, scalable use cases. Storage scales automatically as your data grows, from a single dataset to petabytes, with costs proportional to your active working set and no infrastructure to manage. S3 Files delivers up to 90% lower costs compared to cycling data between S3 and separate file systems.

Data Points

90%
up to 90% lower costs compared to cycling data between S3 and separate file systems
10M+
file system IOPS per bucket
4TB/s+
multiple terabytes per second of aggregate read throughput
25K
compute resources can simultaneously access the same S3 bucket as files

How S3 Files works

S3 Files works like a traditional high-performance file system that can be accessed by any Linux-based compute resource, but its view of files and folders reflects what's in your S3 bucket. S3 Files is built using Amazon EFS, which intelligently loads your active working set onto high-performance storage. This delivers low latencies for frequently accessed data while keeping costs proportional to what you're actively using. When you read files, S3 Files lazily loads portions of file metadata and contents onto high-performance storage. Data that doesn't meet your configured file size threshold is read directly from S3, with no file system storage involved. When you write data, your writes are sent to the highly durable high-performance storage and then synced back to S3 to keep your bucket consistent. Data that hasn't been accessed within a configurable window (1 to 365 days, defaulting to 30) automatically expires from this storage, so you pay only for what you're actively using while your authoritative data always remains in S3.

Features and capabilities

    Any new or existing S3 bucket can be accessed as a file system, with no data migration required. Your existing file-based applications and tools work exactly as they do today. No code changes, no custom connectors, and no new APIs to learn. Use a prefix to scope exactly what your file-based workloads need, so only your active dataset is placed onto the file system’s high-performance storage. Changes sync automatically back to your S3 bucket. Your full dataset stays in S3, available on demand, while your workloads work with precisely the data they need. With S3 Files, you can perform all file operations through a standard NFS file interface with low latency.

    Access the same S3 bucket from multiple compute resources simultaneously through a shared file system. Built for collaboration, S3 Files supports up to 25,000 active connections across any AWS compute resource including Lambda, EC2, ECS, EKS, Fargate, and Batch. They all can read from and write to the same data in real time, with NFS close-to-open consistency. This makes S3 Files ideal for collaborative workloads where multiple users, applications, or agents need shared access to data at the same time.

    S3 Files delivers up to 90% lower costs compared to cycling data between S3 and separate file systems. It keeps costs proportional to your active working set by intelligently loading only the data you’re actively using onto high-performance storage. You configure the file size threshold and retention window, and S3 Files automatically manages what stays on high-performance storage versus what’s read directly from S3. For large reads (1 MiB or larger), data is streamed directly from S3 even if it resides on the high-performance storage. Since S3 is optimized for high throughput, these reads incur only standard S3 GET request costs with no file system charge. When you write data, S3 Files batches your writes on the high-performance storage before committing them to S3, reducing S3 request costs. Data that hasn’t been accessed within your configured window automatically expires, so you pay only for what you’re actively using while maintaining complete access to your entire dataset in S3.

    S3 Files gives you complete access to your entire S3 bucket through both file and object interfaces simultaneously. Your file-based applications work exactly as they do today. Your object-based applications and workflows continue unchanged. S3 Files intelligently caches the data your file workloads need for high performance, while the full scale, durability, and economics of S3 remain the foundation for your data store. No modifications, no tradeoffs, and nothing to reconcile between two separate systems.

Use cases

    AI agents don’t just read data, they act on it, generate new data, coordinate with other agents, and maintain state across tasks. Agents depend on file-based tools like Python libraries, CLI utilities, and shell scripts, and they need a place to read inputs, write outputs, persist memory and logs, and share intermediate state with other agents in a pipeline. S3 Files gives agents all of that in one place. Each agent accesses the same S3 file system and works with shared data using standard file operations, with no service-specific APIs to learn and no coordination layer to build. S3 becomes the shared operating environment for your entire agent fleet, not just the place data ends up when the work is done.

    ML workloads have a specific problem that neither object storage nor traditional file systems alone solve well on their own. Training datasets live in S3 because of its scale and cost, but preprocessing pipelines, feature engineering scripts, and training frameworks all expect to work with files. The result is a copy-and-stage step before every run: pull data from S3, move it to a file system, process it, then push results back. That step adds time, cost, complexity and another failure point to every iteration. S3 Files eliminates this overhead. Data scientists and ML engineers can work with their S3 training data directly through a file system and run their pipelines in place. Preprocessing writes back to the same bucket automatically. Multiple training jobs can read from the same dataset simultaneously without duplication. Faster iterations result in rapid time to quality insights, because the data is always current, always in one place, and always ready to use.

    Modern workflows that span applications, users, pipelines, and tools are built to read and write files. They browse directories, open datasets, append results, and save outputs using standard file operations. Connecting these applications to S3 previously required copying data out of S3 first, because S3's API-based model was incompatible with software that expects a file system. S3 Files removes that barrier. Your existing applications mount your S3 file system and work with your data exactly as they do today, with no code changes, no custom connectors, and no data movement. The application does not know it is talking to S3. It just works.

    Make your S3 data natively available on any compute resource. With S3 Files, your bucket now appears as a mount to your compute resources, giving researchers, analysts, and data scientists the ability to browse directories, open datasets, run simulations, and save results using the tools they already work with. Your teams access S3 data directly from any compute resource using standard file operations, with all members reading from and writing to the same bucket simultaneously.

Customer quotes

Bayer

Bayer is a global leader in healthcare and nutrition with a mission of "Health for All, Hunger for None." The company develops pharmaceutical treatments and agricultural products intended to address challenges associated with a growing global population. Bayer invests in life sciences research and data-driven approaches with the goal of advancing human health outcomes and supporting sustainable food production practices.

“Bayer's research teams work with enormous volumes of scientific data stored in S3. Before S3 Files, accessing this data meant downloading entire datasets to local storage before analysis could begin. This created delays and required us to maintain additional copies across our research infrastructure. S3 Files simplifies this workflow by letting our data scientists mount S3 data directly as a file system, in accordance with our existing security and access control protocols. Researchers can now open datasets, run analyses, and collaborate on shared results without waiting for downloads or managing data movement. This shift has streamlined our R&D operations and reduced infrastructure overhead, freeing our teams to focus on research and innovation.”

Ben Gonzalez, Senior Cloud Engineer, Bayer

Bayer corporate logo in green and blue circular design on a transparent background.

Deloitte

Deloitte is a global leader in professional services, providing consulting, audit, tax, and advisory services across industries with approximately 470,000 professionals across 150 countries.

"As we build AI agent solutions for clients, effective coordination is imperative. Amazon S3 Files is a differentiator that provides our agentic architectures with a shared workspace where agents can retain context, exchange intermediate results, and organize across complex analytics pipelines, all while using standard file operations. With the ability to treat S3 as a local drive for our agents’ logs, state, and checkpoints, this can simplify multi-agent design, sophisticate automation, and accelerate the pace of delivery.”

Chris Jangareddy, Managing Director, AWS Alliance AI and Data Lead, Deloitte Consulting LLP

Missing alt text value

Cloudsmith

Cloudsmith secures the AI-powered software supply chain. Manage binary artifacts and protect your enterprise from malicious packages and vulnerabilities. Agentic AI development means more software, more dependencies, and more risk. Cloudsmith gives your teams curated repositories of pre-verified artifacts they can build with safely, at AI scale and speed.

"Our platform processes hundreds of thousands of artifacts daily, relying on S3 and file systems to handle complex package processing and synchronization workflows that keep customers’ software supply chains secure. This introduces significant operational overhead, latency, and maintenance complexity. We’re excited about using S3 Files, where our Fargate containers and Lambda functions can access S3 data directly as a file system, eliminating the need for custom synchronization logic. This simplifies our architecture, reduces package time-to-availability for developers, systems, and agents, and preserves compatibility with existing file-based tooling and artifact processing workflows that are critical to our global operations."

Ronan O'Dulaing, VP of Engineering, Cloudsmith

Missing alt text value

Presidio

Presidio is a premier IT services and solutions provider helping organizations connect today's technology with tomorrow's innovations. As an AWS Premier Consulting Partner, Presidio guides enterprises across financial services, healthcare, and media and entertainment through cloud infrastructure design, deployment, and management.

“Amazon S3 Files advances migration and modernization by connecting object storage with file-based workloads, enabling customers to access S3 data through familiar file interfaces while preserving S3's durability and scalability. For organizations transitioning to AWS, this provides a flexible path to modernizing legacy applications without disruptive rewrites or redundant storage.

At Presidio, we view Amazon S3 Files as a key enabler for AI-driven workloads by consolidating file and object storage into a single foundation that eliminates duplicate systems and operational complexity. This becomes especially powerful as customers advance generative AI, advanced analytics, and agentic use cases, enabling low-latency access to shared datasets and helping teams move efficiently from development to production.”

Ken Ordini, VP of Engineering, Presidio

Presidio logo in blue text on a white background.

Snorkel AI

Snorkel AI is a frontier AI data lab that helps teams build the data and environments behind high-performing frontier and agentic AI. The company combines platform technology with research-driven data development to create datasets, benchmarks, evals, and custom solutions for real-world AI systems.

"At Snorkel AI, we are focused on building frontier datasets. Our work spans the full AI lifecycle, from curated datasets like Snorkel Data Series for the most challenging agent tasks to evaluation, reinforcement learning, and agent simulation at production scale. With S3 Files, our teams, agents, and evaluators can access data in S3 through a file interface without copying data between systems or waiting for sync jobs. This is especially valuable for reinforcement learning and agentic workloads, where we run many environments and simulations in parallel, support shared workspaces across multiple agents, and execute long-horizon task sequences at scale."

Rustem Feyzkhanov, AI Foundations Engineering Manager, Snorkel AI

Missing alt text value

Torc Robotics

Torc Robotics is a leading autonomous vehicle software company revolutionizing freight with level 4 autonomous class 8 trucks. With its TorcDrive virtual driver software, the company is commercializing self-driving trucks for safe, sustained innovation in the freight industry.

“Amazon S3 Files will enable our data scientists to access machine learning training data directly in S3 using standard file operations, effectively treating S3 as a local drive for our autonomous vehicle workflows. Our developers can work with petabytes of sensor data, simulation results, and model checkpoints directly in S3 as our foundational data layer. This simplifies data access and processing to reduce the time and complexity in building our Physical AI pipeline, enabling Torc to deliver our AV 3.0 TorcDrive verifiable AI stack, data loop, and world simulation for training and validation on the path to safe, scalable autonomous trucking."

Fiete Botschen, Director of Engineering, Machine Learning Model Development, Torc Robotics

Missing alt text value

Qube Research & Technologies (QRT)

Qube Research & Technologies (QRT) is a global quantitative and systematic investment manager that leverages advanced mathematical models and high-performance computing to identify trading opportunities across global markets. The firm's researchers analyze vast datasets in real time, combining historical market data with live feeds to develop and refine sophisticated trading algorithms that drive investment decisions.

"At QRT, our research cycle depends on rapid, iterative experimentation. Our researchers explore petabytes of tick-level market data, alternative datasets, and proprietary signals to build and backtest systematic strategies across asset classes. Historically, this required maintaining separate high-performance file storage alongside S3, forcing teams to constantly synchronize data between our central data lake and compute workstations. S3 Files changes that equation. Our researchers can now work with data directly in S3 from their workstations, running backtests, analyzing results, and iterating on strategies across larger datasets, all without copying data or managing ETL pipelines. This collapses a multi-step workflow into a seamless research loop. For us, that means faster time-to-insight, fewer redundant storage costs, and more engineering effort focused where it belongs: building better systematic strategies.“

Jon Fautley, Head of Cloud Infrastructure, QRT

Missing alt text value

Did you find what you were looking for today?

Let us know so we can improve the quality of the content on our pages