University of Adelaide Analyzes 3 TB of Genomics Data in Hours Using AWS
2020
Researchers at the University of Adelaide are pioneering scientific and bioinformatics research into subjects such as crop genomes, autoimmune diseases, and cancer. As part of one of Australia’s leading research institutions, bioinformaticians at Adelaide are committed to driving research across the university.
“We support researchers across different faculties and schools,” says Dr. Nathan Watson-Haigh, research fellow in bioinformatics at the university’s School of Biological Sciences. To enable this research, the bioinformaticians take advantage of the university’s high-performance computing (HPC) environment, which is used by hundreds of researchers at the university.
Recently, the School of Biological Sciences collaborated with researchers from the Plant Breeding and Acclimatization Institute (IHAR) in Poland on a project to study the diversity of wheat. “Our aim was to analyze wheat exome data to identify genes that could contribute to increased crop yield,” says Watson-Haigh.
As part of the project, researchers needed to analyze 48 wheat exomes and 18 whole barley genomes—totaling almost 3 terabytes of data. However, timely analysis of the data required increased HPC compute capacity to ensure the project could happen.
“We needed more capacity, and we also had to deal with job queuing on our local infrastructure,” says Watson-Haigh. “Due to delays in receiving the raw data, we had less than four weeks to complete the analysis. We knew it would take us months of CPU hours to complete the analysis and given the demands on our internal HPC, this would have been weeks of wall time.”
Using our AWS-based HPC cluster, we easily analyzed 3 terabytes of wheat exome data in 6 hours. Typically, this kind of project would take at least two weeks to complete on our local infrastructure."
Dr. Nathan Watson-Haigh
Research Fellow in Bioinformatics, University of Adelaide, School of Biological Sciences
Building a New HPC Cluster on AWS
To increase compute capacity and scalability, the University of Adelaide worked with technology partner RONIN to create a new HPC cluster based on the Amazon Web Services (AWS) Cloud. RONIN is a Select Consulting Partner in the AWS Partner Network (APN).
“We trusted RONIN from our previous work with the company, and we also trusted that AWS was the right technology for our needs,” says Watson-Haigh. “We didn’t have another solution that could have enabled this. Otherwise, we would have just been waiting and then missing our deadlines as we tried to set something up.”
The university spent several weeks collaborating with RONIN to build and implement an HPC environment on AWS. The new cluster runs on Amazon Elastic Compute Cloud (Amazon EC2) instances and uses Amazon Simple Storage Service (Amazon S3) buckets to store research data. The cluster relies on AWS Auto Scaling to automatically adjust compute capacity, and it uses the Slurm open-source cluster management system. Additionally, the cluster supports the Snakemake genomics workflow engine, which manages a genomics research workload pipeline to perform highly parallelized data processing.
Using this solution, researchers analyzed wheat exome and barley genome data. “We pushed a total of about 3 terabytes of wheat exome data through the pipeline, utilizing 2,400 CPUs,” says Watson-Haigh.
Analyzing Data in 6 Hours Instead of Two Weeks
With its AWS cluster, researchers analyzed wheat exome data faster than they would have on the university’s internal HPC environment. “Using our AWS-based HPC cluster, we easily analyzed 3 terabytes of wheat exome data in 6 hours,” says Watson-Haigh. “Typically, this kind of project would take at least two weeks to complete on our local infrastructure. This solution definitely increased the research capabilities we had at the university.”
In addition to benefiting from the performance of the cluster, researchers were not required to wait in a queue before their jobs could be submitted and run. “With the shared on-premises platform, researchers required an entire quarter of the cluster to support the workload,” says Lyall Weir, business relationship manager in the Information Technology & Digital Services Department at the University of Adelaide. “Queue times for jobs of this size typically range from a few days up to a few weeks, given the volume of jobs from across the institution constantly running on the cluster. Using a dedicated AWS cluster through RONIN removes the bottleneck and gives valuable time back to the researcher.”
Driving Innovative Genomic Research
Working with the Polish researchers, the University of Adelaide used its AWS-based cluster to help uncover genes which might lead to increased crop yield. “By analyzing diverse wheat, we are able to identify genes that confer resistance to diseases, so we could bring that genetic material into modern bread wheat varieties to improve disease resistance or even resistance to heat and drought,” says Watson-Haigh. As a result, breeding scientists can use the data output to determine the right genetic material. “Scientists could potentially use our data analysis to make decisions about how to more effectively cross different wheat accessions.”
Scaling Up or Down to Control Costs
Watson-Haigh can scale his dedicated 2,400-node cluster up or down on demand to save money during research projects because of the elasticity of AWS. “With our genomics workflow, there are portions that make use of many cores. There are also other portions that require larger amounts of RAM, but are single-core,” says Watson-Haigh. “The actual requirements change through the entire workflow, so being able to scale down a cluster to save money is critical. We can do that on AWS.”
During its research collaboration using the AWS Auto Scaling HPC cluster, the university managed costs and optimized savings by using the on-demand cost model. “This is a very different model to what the university has used before for research computing,” Weir says. “AWS and RONIN give us the ability to burst to the cloud when necessary, without having to purchase additional hardware.”
The university is currently expanding its use of the HPC cluster to drive new research. “I’m in discussion with a colleague who wants to do something similar for a different plant species, looking at 223 whole genome datasets,” says Watson-Haigh. “We know we will have the scalability to support that because of RONIN and AWS.”
To learn more, visit aws.amazon.com/hpc.
About the University of Adelaide
The University of Adelaide is a public university located in Adelaide, South Australia. Established in 1874, it is the third-oldest university in Australia and has more than 22,000 students and 3,400 faculty and staff.
Benefits
- Analyzes genomics data in 6 hours instead of 2 weeks
- Powers innovative research into wheat genomics
- Helps scientists more effectively cross wheat
- Reduces research costs
AWS Services Used
Amazon EC2
Amazon Elastic Compute Cloud (Amazon EC2) is a web service that provides secure, resizable compute capacity in the cloud. It is designed to make web-scale cloud computing easier for developers.
Amazon S3
Amazon Simple Storage Service (Amazon S3) is an object storage service that offers industry-leading scalability, data availability, security, and performance.
AWS Auto Scaling
AWS Auto Scaling monitors your applications and automatically adjusts capacity to maintain steady, predictable performance at the lowest possible cost.
Get Started
Companies of all sizes across all industries are transforming their businesses every day using AWS. Contact our experts and start your own AWS Cloud journey today.