Amazon File Cache FAQs

General

What is Amazon File Cache?

Amazon File Cache is a fully managed high-speed cache on AWS that makes it easier to process file data, regardless of where the data is stored.

Amazon File Cache serves as temporary, high-performance storage for data in on-premises file systems, or in file systems or object stores on AWS. The service allows you to make dispersed datasets available to file-based applications on AWS with a unified view and high speeds.

You can link the cache to multiple NFS file systems—including on-premises and in-cloud file systems—or Amazon Simple Storage Service (S3) buckets, providing a unified view of and fast access to your data spanning on-premises and multiple AWS Regions. The cache provides read and write data access to compute workloads on AWS with sub-millisecond latencies, up to hundreds of GB/s of throughput, and up to millions of IOPS.

When should I use Amazon File Cache?

Amazon File Cache makes it easier for you to use the agility, performance, and cost efficiency of AWS compute resources for processing data no matter where the data is stored.

Use Amazon File Cache to run compute-intensive workloads on AWS such as visual effects (VFX) rendering, chip design simulation, and genomic analysis when your data is not in the same AWS Region as your compute or when it spans multiple locations.

You can also use Amazon File Cache to process Amazon S3 datasets when you need fast, low-latency access to your Amazon S3 data using a file interface. Amazon File Cache also supports NFS v3 datasets from Amazon FSx for Open ZFS and Amazon FSx for NetApp ONTAP.

How do I get started with Amazon File Cache?

You can create a cache from the AWS Management Console, the AWS Command Line Interface (CLI), or the AWS API and various language-specific SDKs. Your cache can be running and accessible to your compute instances within minutes. Learn more about getting started with Amazon File Cache.

When you mount the cache, the data in your NFS file systems or Amazon S3 buckets appears as directory and file listings. To minimize storage consumption, the contents of the files and objects are imported only when a file or object is first accessed in the cache.

What instance types and AMIs work with Amazon File Cache?

Amazon File Cache works with the most popular Linux-based AMIs, including Amazon Linux, Amazon Linux 2, RHEL, CentOS, SUSE Linux, and Ubuntu. The service is also compatible with x86-based Amazon Elastic Compute Cloud (EC2) instances, Arm-based Amazon EC2 instances powered by the AWS Graviton2 processor, Amazon Elastic Container Service (ECS) container instances, and Amazon Elastic Kubernetes Service (EKS) container instances. With Amazon File Cache, you can mix and match the instance types and Linux AMIs that are connected to a single cache.

How do I access a cache from a compute instance?

To access your cache from a Linux instance, first install the open-source Lustre client. Once it’s installed, you can mount your cache using standard Linux commands. Once mounted, you can work with the files and directories in your cache just like you would with a local file system. See the File Cache documentation for more details.

The AWS Lustre client repository provides clients modules that are compatible with Amazon Linux, Amazon Linux 2, Red Hat Enterprise Linux (RHEL), CentOS, and Ubuntu operating systems. See the Amazon File Cache documentation for more details.

How do I manage a cache?

Amazon File Cache is a fully managed service, which means that file storage infrastructure is managed for you. When you use Amazon File Cache, you avoid the cost and complexity of deploying and maintaining complex caching infrastructure.

You can administer a cache through the AWS Management Console, the AWS CLI, or the AWS API and various language-specific SDKs. The console, API, and SDK provide the ability to create and delete caches, create and edit cache tags, and display detailed information about caches.

Can I concurrently access data from my cache and linked NFS file systems or Amazon S3 buckets?

Yes, you can concurrently access data from both your cache and data repository (NFS file systems/Amazon S3 bucket).

What are the prerequisites for linking my on-premises file servers to Amazon File Cache?

To link your on-premises file server to Amazon File Cache, your on-premises file server must support the NFSv3 protocol. Prior to using Amazon File Cache with an on-premises data source, you must set up an AWS Direct Connect or virtual private network (VPN) connection between your on-premises network and the Amazon Virtual Private Cloud (VPC) where your file cache is stored, and ensure that traffic is permitted between the cache and the on-premises file server in your VPC security groups, on-premises firewall, and NFSv3 file servers.

How do I link NFS file systems or an Amazon S3 bucket to my cache?

You can link one or more (up to eight) NFS file systems or Amazon S3 buckets (data repositories) to a cache at any time by creating data repository associations through the AWS Management Console, SDKs, or CLI. Once the NFS file system or Amazon S3 bucket is linked, you can access it as a directory in your cache. For each NFS data repository association, you can link multiple NFS exports.

What options do I have for importing data to my cache from linked data repositories?

Amazon File Cache has two options for importing data from your data repositories to the cache—lazy-load (default) and preload. Lazy-load imports data on demand upon first access, and preload imports data before you start your workload. You can specify lazy-loading and preloading preferences for metadata (file/object names and attributes) and data (the file/object contents). Lazy-loading is preferable for most workloads because it allows your workload to start without waiting for metadata and data to be imported to the cache. Preloading is preferable when your access pattern is sensitive to first-byte latencies. See the documentation for more information for best practices for preloading data into your cache.

Are newly added or modified files in my linked data repository accessible in my cache?

Files added to a data repository after a cache is linked will be available in the cache automatically. If a file is modified in a data repository, and the file is not loaded in the cache, the modified version will be imported to the cache via lazy-loading. Files that are already loaded in the cache will not be automatically updated when the file is updated in the data repository.

Can I export data from my cache to linked data repositories?

Yes, Amazon File Cache allows you to export new or changed files back to your linked data repository using file system commands (see Exporting changes to the data repository for more details). You can modify files in either your linked data repository or the cache and each will persist updates in the order they receive them. If you modify the same file in both the linked data repository and the cache, you should coordinate updates at the application level to prevent conflicts. Amazon File Cache will not prevent conflicting writes in multiple locations.

Can I link multiple caches to the same set of NFS file systems or Amazon S3 buckets?

Yes, you can link multiple caches to the same set of data repositories. You can modify files in any of the linked caches. Your linked data repository and each cache will persist updates in the order they receive them. If you modify the same file in multiple caches, you should coordinate updates at the application level to prevent conflicts. Amazon File Cache will not prevent conflicting writes in multiple locations.

How are directories, symbolic links, portable operating system interface (POSIX) metadata, and POSIX permissions imported from and exported to Amazon S3?

Amazon File Cache stores directories and symbolic links (symlinks) as separate objects in your Amazon S3 bucket. For example, a directory is stored as an Amazon S3 object with a key name that ends with a slash.

Amazon File Cache also automatically transfers POSIX metadata and permissions for files and directories when importing data from and exporting data to Amazon S3. The POSIX metadata is stored as Amazon S3 object metadata using a standard format shared across AWS file services. See POSIX metadata support for data repositories for more details.

How do I monitor my cache’s activity?

Amazon File Cache integrates with Amazon CloudWatch, allowing you to monitor cache health and performance metrics in real time. Example metrics include storage consumed, throughput, and number of file operations per second. You can log Amazon File Cache API calls using AWS CloudTrail.

What Regions is Amazon File Cache available in?

See Regional Products and Services for details on Amazon File Cache service availability by Region.

Scale and performance

What performance can I expect from Amazon File Cache?

Once a file is cached, it is served directly out of the cache to your compute instances or containers with consistent sub-millisecond latencies, up to hundreds of Gbps of throughput, and up to millions of IOPS. Your cache has a baseline throughput capacity of 1,000 MB/s per TiB of cache storage. If the requested data is not cached, it is copied to the cache from the linked data repository at speeds up to the baseline throughput capacity.

The cache’s throughput capacity is shared between clients accessing the cache and data movement between the cache and data repositories. For example, a 4.8 TiB cache with 4.8 GB/s of throughput can load files from on-premises file servers over a 10 Gbps (1.25 GB/s) connection while simultaneously supporting 3.55 GB/s of I/O from clients.

The rate at which Amazon File Cache copies files from your on-premises file server also depends on the bandwidth and roundtrip latency of your AWS Direct Connect/VPN link and the throughput supported by your on-premises file server. For the best performance, AWS Direct Connect is recommended.

See the Amazon File Cache Performance documentation for more details.

How many instances can connect to a cache?

A cache can be concurrently accessed by thousands of compute instances.

What cache sizes are supported by Amazon File Cache and what is the increment granularity?

Caches can be created in sizes of 1.2 TiB or in increments of 2.4 TiB. In addition to cache storage, each cache also requires 2.4 TiB of metadata storage capacity.

How many caches can I create?

There is a 100 cache limit per account, which can be increased on request.

Security and availability

How does Amazon File Cache secure my data?

Amazon File Cache encrypts data at rest and in transit.

Your data is always encrypted at rest using keys managed through AWS Key Management Service (KMS). You can use either service-owned keys or your own keys.

Amazon File Cache automatically encrypts data in transit when the cache is accessed from select Amazon EC2 client instance types. Amazon File Cache encrypts traffic between the cache and your S3 data repositories using HTTPS (TLS).

For in-transit encryption between your on-premises file server and Amazon File Cache, you can use a VPN to ensure encrypted data transfers between your VPC and your on-premises network. You can also use message authentication code (MAC) Security (MACsec) to encrypt data from your corporate data center to the AWS Direct Connect location.

What access control capabilities does Amazon File Cache provide?

Every Amazon File Cache resource is owned by an AWS account, and permissions to create or access a resource are governed by permissions policies. You specify the VPC in which your cache is made accessible, and you control which resources have access to your cache using VPC security groups. You control who can administer your cache resources (such as create and delete) using AWS Identity and Access Management (IAM).

Does Amazon File Cache support shared VPCs?

Yes, with Amazon File Cache, you can create and use caches in shared Amazon VPCs from both owner accounts and participant accounts with which the VPC has been shared. VPC sharing allows you to reduce the number of VPCs that you need to create and manage, while you still benefit from using separate accounts for billing and access control.

What are the availability characteristics of Amazon File Cache?

Amazon File Cache uses a parallel file system for caching your data. In parallel file systems, data is stored across multiple network file servers to maximize performance and reduce bottlenecks, and each server has multiple disks. Larger caches have more file servers and disks than smaller caches. If a file server becomes unavailable it is replaced automatically within minutes. In the meantime, client requests for data on that server transparently retry and eventually succeed after the file server is replaced. Data is replicated on disks and any failed disks are automatically and transparently replaced behind the scenes.

Does Amazon File Cache offer a Service Level Agreement (SLA)?

Yes. The Amazon File Cache SLA provides a service credit if your monthly uptime percentage is below the service commitment in any billing cycle.

Pricing and billing

How will I be charged and billed for my use of Amazon File Cache?

Pay only for the resources you use. See the Amazon File Cache pricing page for details.

Do your prices include taxes?

Except as otherwise noted, our prices are exclusive of applicable taxes and duties, including value-added tax (VAT) and applicable sales tax. If you have a Japanese billing address, use of AWS services is subject to Japanese Consumption Tax. Learn more.

Next Steps

Documentation

Learn more about File Cache

Read the guide

Pricing

Learn more about Amazon File Cache pricing

Visit the pricing page