AWS Perfects Cloud Service for Supercomputing Customers (2024)

Amazon’s AWS believes it has finally created a cloud service that will break through with HPC and supercomputing customers. The cloud provider announced the commercial availability of Parallel Computing Service (PCS), which the company hopes will finally get skeptical high-performance and supercomputing customers into the cloud.

The Parallel Computing Service (PCS) is a managed service offering allowing customers to set up and manage high-performance computing (HPC) clusters.

“We can now feel that we’ve increased the level of ease of use to ensure that customers can more easily migrate their HPC workloads onto AWS,” Ian Colle, general manager for advanced computing and solutions at AWS, told HPCwire.

Before PCS, orchestrating high-performance computing on AWS wasn’t easy, with a lot of do-it-yourself tools via tools like the open-source ParallelCluster.

PCS takes away that friction, and customers can manage AWS HPC clusters the same way they manage on-premises environments. Colle said that makes it easier for them to migrate workloads to the cloud.

The key addition here is Slurm (Simple Linux Utility for Resource Management) scheduler, which manages the workloads. It also allows customers to orchestrate their own storage, networking, and other components in an HPC cluster.

Colle described Slurm as “the most popular scheduler out there” globally for HPC workloads.

“A number of customers said, ‘If we just had a fully managed Slurm offering on AWS, that would make our lives so much easier.’ We took the learnings from ParallelCluster, and the customer feedback, and that’s what we’ve created,” Colle said.

HPC customers can replicate Slurm scripts from on-premises environments, which should make the transition to cloud-based HPC easier.

“We’re going to help you set up a compute queue and scheduling queue, and you’re going to go off and be able to start running HPC jobs within minutes with AWS,” Colle said.

Breakthrough For Cloud

Colle joined AWS in 2017, and his goal was to make the cloud more accessible to HPC.

“This service … takes away so much of the friction that a customer would have trying to instantiate AWS resources to run their HPC workloads,” Colle said.

HPC customers have been slow to move to the cloud for many reasons. However, Colle said that the managed service would simplify the use of its hardware and software resources.

Amazon started with a series of HPC offerings, including Elastic Fabric Adapter, a low-latency interconnect, our FSX for the Lustre file system.

It introduced its own CPU called Graviton, now in the fourth generation. A chip called Nitro facilitates data movement and security in the AWS infrastructure.

HPC Forced to Cloud?

At ISC, keynote speaker Kathy Yelick said that most accelerators are locked down by hyperscalers, which are not making their chips commercially available.

She also recommended working closely with hyperscalers “to make sure we’re building systems that are of interest to their market as well as our market.”

Colle said PCS could be one such offering as it allows data scientists and researchers to run applications in minutes.

Some HPC customers are determined not to move to the cloud for reasons that include security, bandwidth concerns, and hardware optimizations.

Amazon isn’t selling its ARM Graviton CPUs and is gobbling limited stocks of Nvidia GPUs. AWS also offers its own homegrown inferencing and training chips through its cloud.

Fewer supercomputers are being built, and computing speeds are flattening out. The biggest supercomputers are being built by cloud providers.

That may ultimately force HPC customers to move applications to the cloud.

Rule of the ARM

AWS is thinking big picture regarding power efficiency, and ARM is central to its plans.

“We’re now on our fourth-generation ARM chip, and now with, especially Virtual Fugaku, … we’re enabling customers to run the exact same ARM-based workloads that they were running on an actual supercomputer in Japan,” Colle said.

HPC customers typically optimize their acceleration for Nvidia and AMD GPUs, not for Graviton and its companion accelerators. However, AWS is investing heavily in ARM, much like Nvidia has also invested in ARM for its upcoming Grace-Blackwell GPU.

“In a similar manner, we could craft — similar to the Grace Blackwell– a Graviton Blackwell because of that similarity in the ARM ecosystem,” Colle said.

For now, Colle said, “I’m sure you’re going to hear more from Nvidia about the things that we’re doing with their Blackwell, with their B100s … and B200s.”

However, the ultimate goal is to eliminate the hardware complications so HPC customers can focus on the results.

“We’re excited about Sagemaker because it helps customers use resources efficiently without needing to worry about the underlying hardware details,” Colle said.

Storage Trends

Colle described himself as a Lustre guy and said AWS offers customers a family of file systems to ensure they can find a performant POSIX-compliant file system on AWS.

However, Colle noticed a significant shift in customers’ behavior, who are moving some of their workloads to object storage.

This trend is driven by the realization that many workloads don’t require the full POSIX-compliant stack. Customers are finding that traditional file systems can sometimes impede performance, as it slows down performance a lot.

Additionally, there’s a cost-benefit, as customers can save a lot of money by moving to object storage. Many customers are adopting a hybrid approach by “doing a combination of smart tiering,” Colle said.

This strategy involves using Lustre “when the portion of their workload requires the full POSIX semantics” while “aging it off to S3 for more long-term storage at a lower cost.”

Colle said this approach allows customers to optimize performance and cost-efficiency in their storage solutions.

SLURMing It

Slurm environments can be deployed in PCS via tools, including AWS Management Console, CLI, and standard API calls.

The AWS parallel computing has eased to a point where data scientists can use it directly. AWS has cost controls and budgeting in its standard AWS tools.

“Customers can actually put that decision-making down to the individual scientists and engineer level and allow them to make that decision on how to best migrate their workloads and which workloads can best benefit from HPC,” Colle said.

RONIN, an AWS partner, previously built out the infrastructure on the AWS ParallelCluster open-source toolkit.

“Now they can use more of that standard AWS API-driven development, and they’re going to retool their offerings to use this AWS PCS because of the ease of use and the simplification of being integrated across AWS services,” Colle said.

AWS Perfects Cloud Service for Supercomputing Customers (2024)
Top Articles
WWE star Mandy Rose turfed after raunchy photos surface
Sonora | Mexican State, History, Culture & Cuisine
Understanding Filmyzilla - A Comprehensive Guide to Movies
Keck Healthstream
Becu Turbotax Discount Code
Hailie Deegan News, Rumors, & NASCAR Updates
Weather Channel Quincy
Solarmovies.ma
Craigslist Kittens Pittsburgh
Sundance Printing New Braunfels
Dupage County Fcrc
Trizzle Aarp
15:30 Est
Palmetto E Services
50 Shades Of Grey Movie 123Movies
Txu Cash Back Loyalty Card Balance
The Athenaeum's Fan Fiction Archive & Forum
Jinx Bl Chapter 26
Clayton Grimm Siblings
Fungal Symbiote Terraria
Alloyed Trident Spear
Standard Specification for Annealed or Cold-Worked Austenitic Stainless Steel Sheet, Strip, Plate, and Flat Bar
Appraisalport Com Dashboard /# Orders
Elfqrindiscard
Scythe Banned Combos
Acb Message Board Yahoo
Pennys Department Store Near Me
Knock At The Cabin Showtimes Near Alamo Drafthouse Raleigh
10 Top-Rated Tourist Attractions in Negril
Rainbird Wiring Diagram
Jodie Sweetin Breast Reduction
Paul Mauro Bio
How To Delete Jackd Account
Basis Independent Brooklyn
Volusia Schools Parent Portal
Bj 사슴이 분수
O'reilly's In Monroe Georgia
Supercopbot Keywords
Beaufort Mugfaces Last 72 Hours
Acceltrax Sycamore Services
John Deere Z355R Parts Diagram
What Are Cluster B Personality Disorders?
Ts Massage San Jose Ca
Strange World Showtimes Near Amc Marquis 16
Katie Hamden Of
Disney Immersive Experience Cleveland Discount Code
Varsity Competition Results 2022
Server Jobs Near
Kgtv Tv Listings
Natriumazid 1% in wässriger Lösung
Caldo Tlalpeño de Pollo: Sabor Mexicano - Paulina Cocina
Jailfunds Send Message
Latest Posts
Article information

Author: Annamae Dooley

Last Updated:

Views: 5808

Rating: 4.4 / 5 (45 voted)

Reviews: 92% of readers found this page helpful

Author information

Name: Annamae Dooley

Birthday: 2001-07-26

Address: 9687 Tambra Meadow, Bradleyhaven, TN 53219

Phone: +9316045904039

Job: Future Coordinator

Hobby: Archery, Couponing, Poi, Kite flying, Knitting, Rappelling, Baseball

Introduction: My name is Annamae Dooley, I am a witty, quaint, lovely, clever, rich, sparkling, powerful person who loves writing and wants to share my knowledge and understanding with you.