AWS for Industries
Building Digitally Connected Labs with AWS
Many Life Sciences organizations struggle with lab digitalization that can scale, automate data tasks, enable AI/ML, and create collaborative environments that support diverse R&D efforts. The AWS Digital Lab Strategy is a set of services, architectures, and partners to help take advantage of cloud scale and agility.
Efforts to build digital lab capabilities seem to be stuck at the starting line or in the pilot phases. A recent study stated that some 37% of labs are conducting digital lab pilots and 40% of labs still have not attempted a pilot.
This is not a failure of innovation, creativity, or know-how. This slow shift to scaling is the absence of a modern data strategy in these digitalization efforts. In the area of digital labs, AWS Cloud allows you to:
- Automate data collection from instrument to cloud
- Create data products from your lab data
- Conduct high-throughput data analysis
- Optimize your laboratory through AI and machine learning (ML)
- Search your data
- Improve automation between lab systems
Along with these capabilities are standard tools, best practices, and partners helping life sciences companies take full advantage of the virtually unlimited scale and performance of AWS cloud infrastructure.
AWS Digital Labs Strategy
The AWS Digital Labs Strategy is an extension of the AWS Modern Data Strategy. In this approach AWS starts with a scalable data mesh, which is a self-service, decentralized data system and set of processes, to drive governance across multiple data lakes.
Around the data mesh sit purpose-built services, which are tailored to the R&D personas you see in Figure 1 in green and their responsibilities.
Figure 1 – The AWS Digital Labs Strategy is a framework for building a digital lab with AWS. Here, the AWS Modern Data Strategy is extended to encompass the lab personas in green, with purpose-built solutions that enable the capability pillars below.
A key tenet is the data-as-a-product principle, in which data producers and consumers drive productization of data coming from single or multiple data lakes. This results in a self-serve data platform in which data consumers (wet-lab and dry-lab scientists) can discover, access, and use data, while data producers (instruments, simulations, high-performance computing outputs, ML algorithms) can create and maintain data products.
Another key tenet is unified policy management (centralized governance and audit) with federated fine-grained access control to enable organization wide settings.
A final key tenet is seamless communication (through events and notifications) to and from the data mesh and between systems.
The Journey of the Data
One of the benefits of the AWS Digital Labs Strategy is that it lets you consider the journey of the data, which can create valuable feedback loops between different R&D data consumers and producers, connected through the cloud.
One example is in lead optimization, as part of the drug discovery process. This journey often begins when computational modeling results pass to a wet-lab experimentation group for validation. Wet-lab scientists incorporate those results into experimental design and perform confirmatory wet-lab tests. When instrument data is acquired, the results pass to an ML model for quality control and interpretation, which sends insights back to the computational model for improvement―closing the feedback loop.
This journey includes steps for operational oversight to optimize lab processes and steps where external contract research organizations (CROs) or real-world data providers can be data producers in this loop.
Variations of the data journey exist for diagnostic laboratories and quality control (QC) labs conducting high-throughput sample processing. These labs can benefit from having data stored as an immutable data product in the data mesh, which can be consumed by primary systems like a LIMS, as well as secondary consumers like data science and operations teams. Within this loop, these labs can also take advantage of AWS high-performance computing (HPC), for efficient batch processing of imaging, genomics, and multi-omics data.
The AWS Digital Lab Strategy enables this complex choreography between R&D producers and consumers. It automatically:
- Creates data provenance and preserves data lineage
- Notifies systems and people when new data is present in the data lake
- Provides findability for all eligible consumers
- Minimizes unnecessary data movement
Figure 2 – An example of a data journey in drug discovery. Orange lines illustrate the data journey in a lead discovery lab, which overlays the AWS Digital Labs Strategy.
Core Digital Lab Capabilities Create a Foundation for Digital R&D
Another benefit of the AWS Digital Lab Strategy is that labs don’t need to adopt it all at once, and can integrate it with existing equipment and lab systems. A fully realized digital lab includes several core capability areas that can be adopted over time.
Instrument to Cloud
A foundational step is to bring instrument data to the data mesh to make it accessible for review and analytics by scientists. With a broad variety of AWS services and reference architectures you can automate data transfer from any instrument in the laboratory to the data mesh, in an extensible platform. This provides discoverability of lab data and added context through metadata to understand the origin of the data.
Some examples include:
- AWS DataSync provides performant data transfer to the data mesh for multi-terabyte size instrument files, to help scientists focus on high value science, instead of data movement
- Amazon S3 File Gateway lets scientists access cloud-backed files using their laptops, with little change to their existing behaviors
- AWS Transfer Family lets partnering entities, like CROs, upload study results, to lower the friction of secure collaborations
Lab operations teams may want to have a real-time operational view as well. These teams can configure IoT agents at the edge, process streamed data, and report on the status in near real-time. Instruments can be organized as a fleet, to monitor status and utilization. AWS Services can help produce 3D digital twins of lab facilities for intuitive user interfaces.
For customers looking to get started quickly, we offer the AWS Guidance for Life Science Data Transfer, an architectural best practice that is in production with multiple Life Sciences customers. For an out-of-the-box data collection and harmonization solution, we also have AWS Partner solutions in this space that can simplify data ingestion from instruments, such as TetraScience. The solution additionally harmonizes the data through conversion to unified, open schemas—adding levels of interoperability and reusability.
Governing Access to Lab Data
Data integrity is a bedrock of science. Labs want to store data as immutable artifacts to provide provenance and aid in reproducibility. This need reflects some of the original purposes of the lab notebook.
To assist here, AWS provides tools to create a unified governance approach to your lab data products. These set the access controls on the data coming from the lab, to make sure that the right divisions within your company have access, but not others, and to monitor access to the data. These also set the data integrity controls through versioning tools and checksum features.
Additionally, we see customers producing data products from different scientific domains (for example, wet-labs, dry-labs, and QC labs), and governing them as data products, in a central way under a data mesh.
While some organizations may wish to build the R&D data governance themselves with AWS best practices, others may benefit from AWS Partner solutions. Quilt Data and Collibra can help in this space, accelerating time to benefit for customers looking for out-of-the-box data mesh and data catalog solutions.
High-Performance Computing (HPC)
Large life sciences datasets like imaging, multi-omics, and cryo-electron microscopy (CEM) require high-performance, low latency computing power, which often goes beyond the limits of on-premises infrastructure. Flexible configuration of the more than 500 Amazon Elastic Compute Cloud (Amazon EC2) instance types means scientific researchers can get the computing performance they need, at the lowest cost, for a variety of research workloads.
Beyond the CPU, GPU and field-programmable gate array (FPGA) configurability, AWS services can help to setup and configure HPC applications requiring high bandwidth low-latency networking and high performance file systems. With hybrid compute and storage tools, AWS also enables labs that need to retain some storage on-site due to data volumes or high latency requirements.
AWS has tested and trusted architectural patterns to stand up your own genomics, imaging, and modeling pipelines faster. With the help of these architectures, computational researchers can imagine their ideal computing system for a project, test it, optimize it, run it for the duration of the project, and then, tear it down with no ongoing costs.
AI and ML
On top of all the data that is collected and processed, labs are using AI and ML to discover therapeutic targets and new therapeutics interventions.
For expert ML practitioners, we offer optimized cloud images for popular deep learning frameworks such as PyTorch, TensorFlow, and MXNET. These are being used for:
- Predicting biomolecular structure, function, and interactions
- Conducting classifications on large medical imaging datasets
- Making disease predictions from raw, disparate health data
For everyday ML practitioners, Amazon SageMaker (SageMaker) makes it straightforward to build, train, tune and deploy ML models. This is being use by life science companies like AstraZeneca, as the workbench for ML. For novice ML practitioners, SageMaker Canvas is a no-code platform for model training and development. SageMaker helps companies optimize workflows and critical equipment throughput by using predictive capabilities to speed time to clinic.
With these “everyday” ML tools, labs can easily develop machine learning models for tasks that help with lab productivity, including:
- Identifying assay drift and predicting performance issues
- Getting recommendations for optimizing assays
- Providing insights on reagent quality
AWS is continuously delivering AI and ML capabilities to builders of all levels, and AWS Digital Lab Strategy brings AI and ML closer to lab.
Search and Exploration
Research is rarely linear, and sometimes the most helpful tool is a search bar. Natural language search is a powerful tool to discover historical data within your R&D organization, to uncover connections, and to generate new hypotheses.
To address this R&D organizations are creating high-performance search indexes for their research including indexes of experiment types, catalogs of structured data, and chemical compound inventories.
Another approach is to build knowledge graphs of existing data that may be located in silos, in different formats, from different sources, and of different relevance. Analysts who aren’t disease experts can now use natural language processing (NLP) to scour that data, create connections between data, and then navigate and browse the knowledge graph.
Knowledge graphs let your data self-organize and allows you to search with natural language keywords to explore areas. The approach of combining NLP with knowledge graph has been used in drug repurposing, to create a genealogy of R&D assets for regulatory submission, and for prediction of absorption, distribution, metabolism, and excretion (ADME).
Automation of lab systems
Many labs rely on electronics lab notebooks (ELNs) and lab information management systems (LIMS), as the central user-interface for scientists. These applications help scientists design experiments, plan inventory, and manage samples.
However, digital labs require coordination between the ELN, LIMS and other systems interacting with the data mesh such as AI/ML, HPC, and instrument/robotics control systems. These create a full data journey.
Automating these systems is accelerated by using a cloud-enabled ELN or LIMS. Thermo Fisher Scientific, Benchling, Perkin Elmer and other partners have innovative products in this space with interactive user workflows, including voice and handsfree interfaces.
Once these systems are on the AWS cloud, data orchestration and software automation become possible, with event driven architecture. An event driven architecture is a modern architecture pattern built from small, decoupled services that publish, consume, or route events. Using low-code tools, lab administrators can create workflows that operate around the data mesh. These can help:
- Notify lab users of updates to the data mesh
- Ingest data from lab software to the data mesh
- Automate API calls from the data mesh to lab software
- Initiate life sciences HPC pipelines
- Deliver events between different lab systems
Build Quickly with AWS Digital Labs Strategy
To help Life Sciences teams get started quickly, we pair many of these services with AWS Solutions for Life Sciences and our architectural library for common Life Sciences use cases.
For customers who prefer to bring on these capabilities out-of-the-box, or want to build on top of existing systems in place, AWS Partners offer solutions that can be readily deployed. Several modern LIMS and ELNs can be cloud-connected to start to take advantage of the AWS Digital Labs Strategy.
Finally, on the journey to a digital lab, you don’t have to do it alone. AWS Professional Services can help design and accelerate your projects. For companies looking to leverage a consulting partner, there are teams of validated consulting partners including BioTeam, Clovertex, Accenture, and Deloitte that specialize in implementing life sciences workloads on AWS. To learn what AWS can do for you contact your AWS Representative.