Aurora Supercomputer Capacity and Performance Beyond Expectations.
Introduction
Intel, HPE Load Aurora Supercomputer With Over 63,000 GPUs, 21,000 CPUs.Intel semiconductor giant, working with Hewlett Packard Enterprise, has completed the installation of more than 10,000 compute blades in the U.S. Department of Energy’s long-delayed Aurora supercomputer, which could become the world’s fastest with Intel’s latest CPUs and GPUs when it is expected go online later this year. The system incorporates more than 1,024 storage nodes (using DAOS, Intel’s distributed asynchronous object storage), providing 220 petabytes (PB) of capacity at 31 terabytes per second of total bandwidth, and leverages the HPE Slingshot high-performance fabric. Later this year, Aurora is expected to be the world’s first supercomputer to achieve a theoretical peak performance of more than 2 exaflops (an exaflop is 10^18 or a billion billion operations per second)
Aurora will harness the full power of the Intel Max Series GPU and CPU product family. Designed to meet the demands of dynamic and emerging HPC and AI workloads, early results with the Max Series GPUs demonstrate leading performance on real-world science and engineering workloads, showcasing up to 2 times the performance of AMD MI250X GPUs on OpenMC, and near linear scaling up to hundreds of nodes.1 The Intel Xeon Max Series CPU drives a 40% performance advantage over the competition in many real-world HPC workloads, such as earth systems modeling, energy and manufacturing.2
Why Aurora Matters Most: From tackling climate change to finding cures for deadly diseases, researchers face monumental challenges that demand advanced computing technologies at scale. Aurora is poised to address the needs of the HPC and AI communities, providing the necessary tools to push the boundaries of scientific exploration.
“While we work toward acceptance testing, we’re going to be using Aurora to train some large-scale open source generative AI models for science,” said Rick Stevens, Argonne National Laboratory associate laboratory director. “Aurora, with over 60,000 Intel Max GPUs, a very fast I/O system, and an all-solid-state mass storage system, is the perfect environment to train these models.”
The Aurora supercomputer program is an ongoing project by the United States Department of Energy (DOE) to develop and deploy a series of exascale supercomputers. Exascale computing refers to computing systems that can perform a quintillion (10¹⁸) calculations per second. The goal of the Aurora program is to advance scientific research and innovation by providing unprecedented computational power for solving complex problems.
Here are some key points about the Aurora supercomputer program:
- Objective: The primary objective of the Aurora program is to develop exascale supercomputers capable of handling data-intensive scientific simulations, modeling, and machine learning applications. These systems are expected to enable breakthroughs in various fields, such as energy, climate modeling, advanced materials, genomics, and more.
- Collaboration: The Aurora program involves collaboration between several entities. The DOE’s Office of Science leads the program, and it works in partnership with the Argonne National Laboratory (ANL), Intel, and Cray (now part of Hewlett Packard Enterprise) for the development of the supercomputer.
3. Aurora Systems: The Aurora program envisions the deployment of multiple exascale systems. The first planned system, called Aurora, is being developed at the Argonne National Laboratory and is expected to be operational in the early 2020s. It will be based on a future-generation Intel technology, including Intel’s Xe architecture and the Cray “Shasta” supercomputer infrastructure.
4. Technical Specifications: The Aurora supercomputer is expected to have a peak performance of at least one exaflop, meaning it can perform one quintillion floating-point operations per second. It will incorporate advanced technologies such as novel processor architectures, high-bandwidth memory, advanced storage systems, and improved interconnects to deliver unprecedented computational capabilities.
5. Applications: The Aurora supercomputer is anticipated to significantly advance research in a wide range of domains. It will enable scientists and researchers to simulate and analyze complex phenomena with high accuracy and resolution, leading to breakthroughs in areas such as climate modeling, energy production and consumption optimization, drug discovery, cosmology, and more.
6. Impact: The availability of exascale computing power through the Aurora program is expected to accelerate scientific discoveries, drive innovation, and address some of the most pressing challenges facing society. It will empower researchers to tackle complex problems at an unprecedented scale and speed, leading to advancements in various scientific disciplines and potential improvements in areas like healthcare, energy efficiency, and environmental sustainability.
How Aurora Works : At the heart of this state-of-the-art system are Aurora’s sleek rectangular blades, housing processors, memory, networking and cooling technologies. Each blade consists of two Intel Xeon Max Series CPUs and six Intel Max Series GPUs. The Xeon Max Series product family is already demonstrating great early performance on Sunspot, the test bed and development system with the same architecture as Aurora. Developers are utilizing oneAPI and AI tools to accelerate HPC and AI workloads and enhance code portability across multiple architectures.
The installation of these blades has been a delicate operation, with each 70-pound blade requiring specialized machinery to be vertically integrated into Aurora’s refrigerator-sized racks. The system’s 166 racks accommodate 64 blades each and span eight rows, occupying a space equivalent to two professional basketball courts in the Argonne Leadership Computing Facility (ALCF) data center.
Researchers from the ALCF’s Aurora Early Science Program (ESP) and DOE’s Exascale Computing Project will migrate their work from the Sunspot test bed to the fully installed Aurora. This transition will allow them to scale their applications on the full system. Early users will stress test the supercomputer and identify potential bugs that need to be resolved before deployment. This includes efforts to develop generative AI models for science, recently announced at the ISC’23 conference.