KAUST Empowers Researchers to Think Big

Datetime:2016-08-23 05:28:26          Topic: HPC           Share

In this special guest feature, Jane Glasser writes that Saudi Arabia has moved into the global supercomputing top ten with Shaheen II, a 200,000-core behemoth that’s taming global warming, earthquakes, and more.

The Shaheen II supercomputer at KAUST. Photo credit: Anastasia Khrenova.

With earth’s oceans warming, scientists struggle to get their arms around all the data swimming around in such vast bodies of water. Some are studying the more contained Red Sea—nearly as vast as an ocean—to learn more about ocean currents, biology, and temperatures.

With “mega” quakes threatening the Pacific Northwest of the United States and other areas of the world, scientists can’t process seismic data fast enough.

Solar energy is huge in the Middle East. But so are dust storms. There’s got to be a science behind solar cell placement to optimize solar energy capture.

These and other scientific inquiries received a big boost when King Abdullah University of Science and Technology ( KAUST ) acquired a Cray® XC40™ supercomputer in 2015. The new system, called Shaheen II , is 25 times more powerful than KAUST’s previous IBM Blue Gene/P system. With this investment, KAUST has significantly augmented its ability to tackle scientific quests like those mentioned above.

Based in Thuwal, Saudi Arabia, KAUST is a global graduate-level university that is rapidly establishing its reputation as a top performing research university. Founded in 2009, it provides some of the world’s best equipped laboratories and an unmatched range of instrumentation under one roof.

A simulation of seismic wave propagation highlights particle motion for an earthquake scenario at the Qadimah fault located ~5km north of KAUST, incorporating 3D Earth structure with topography/bathymetry. Courtesy of Daniel Peter.

From its start, KAUST has offered high-performance computing (HPC), recognizing it as a key enabler of discovery across all fields of science. “About 20 percent of the faculty here use computing as the main way of driving discovery, so having a state-of-the-art supercomputer at our disposal is indispensable to our mission,” says David Keyes, Director of the Extreme Computing Research Center at KAUST.

Adds KAUST President, Jean-Lou Chameau, “Our goal is to empower faculty and students with the freedom to think big, aim high, and explore some of the world’s most difficult challenges. Shaheen II accelerates our supercomputing capabilities in both the laboratory and in learning environments, so that our people can collaborate on discoveries that will benefit Saudi Arabia and the world.”

200,000 cores—and more

When KAUST opened its doors in 2009, its Supercomputing Lab boasted an IBM Blue Gene/P system (named Shaheen, the Arabic word for peregrine falcon). At the time, it was the 14th fastest supercomputer in the world, capable of 222 teraflops.

By 2014, Shaheen was ready for an upgrade, and the university put out tenders for a system that would take it into petaflop (Pflop) territory. Cray answered the call. The Seattle-based firm assembled a system with nearly 200,000 Intel® Xeon® processor cores and the capacity to deliver more than 7.2 Pflops of theoretical peak performance. This represents an approximately 25-fold boost over the first Shaheen, and Keyes notes that the new supercomputer is also six times more power efficient than its predecessor.

With 5.536 Pflops of sustained LINPACK performance, Shaheen II is the largest and most powerful supercomputer in the Middle East and the tenth fastest supercomputer in the world, according to the June 2016 TOP500 list.

Cray also produced workflow-integrated application acceleration capabilities, built on flash storage and the Cray XC system high-speed interconnect. The Cray DataWarp™ I/O accelerator provides 5x the performance of disk-based systems at the same cost, and offloads bursty and I/O-intensive workloads from disk-based parallel file systems. Now customers can obtain pure performance for I/O-hungry applications without any application modifications, use compute resources more efficiently, and reduce the size of the underlying disk-based file system. Cray DataWarp also integrates with a variety of workload management tools to automate storage management and data movement.

We gathered a half dozen of our leading research applications and designed benchmarks to test their performance on the proposed systems,” says Rooh Khurram, Computational Scientist at the KAUST Supercomputing Lab. “The major determinant was the scalability of our applications, such as WRF and WRFChem, used in weather modeling and prediction; NGA, a complex combustion modeling code; and CFD/MHD, used to analyze problems that involve electrically conducting fluids. Our workloads are very well suited to Intel Architecture.”

The system has 6,174 dual-socket compute nodes based on 16-core Intel Haswell processors running at 2.3GHz. Each node has 128GB of DDR4 memory running at 2,300MHz. Overall, the system has a total of 197,568 processor cores and 790TB of aggregate memory.

The compute nodes are housed in 36 water-cooled Cray XC40 cabinets, and connected via the Cray Aries interconnect. This intercommunications technology, implemented with a high-bandwidth, low-diameter network topology called Dragonfly, provides substantial improvements on all of the network performance metrics for HPC: bandwidth, latency, message rate and more.

Shaheen II has a richly layered data storage architecture. The main data storage solution is a Lustre* Parallel file system based on Cray® Sonexion® 2000 Scale-Out Lustre Storage System with a usable storage capacity of 17.2 PB, delivering around 500 GB/s of I/O throughput. The DataWarp burst buffer, consisting of nearly 500 Intel® Solid State Disks, provides more than 1.5 TB/s of bandwidth.

The Intel SSDs are fully integrated with the Cray Aries interconnect. The data still resides on the Lustre file system, but computation does not begin until all data is staged in the SSD pool.

Burst buffer with Cray DataWarp and 500 Intel® Solid-State Drives P3608 Series

Key to the warp-speed performance of Shaheen II is the Cray DataWarp I/O accelerator. It gives I/O-intensive applications an additional storage layer between the compute nodes’ main memory and the Lustre parallel file system. The Shaheen II burst buffer performs at 1.5 TB/s of I/O bandwidth and provides 1.5 PB of storage capacity through nearly 500 Intel® SSD P3608 series.

With a system as core-heavy as Shaheen II, processing power isn’t the problem. The performance bottleneck is getting data out of storage. A burst buffer is an intermediate, high-speed layer of storage that is positioned between the application and the parallel file system (PFS), absorbing the bulk data produced by the application at a rate a hundred times higher than the PFS, while seamlessly draining the data to the PFS in the background.

The Intel SSDs are fully integrated with the Cray Aries interconnect. The data still resides on the Lustre file system, but computation does not begin until all data is staged in the SSD pool.

Less waiting, more learning

KAUST researchers have dug into the new system and love what they’ve seen so far.

Geoscientist Georgiy Stenchikov leverages supercomputing for modeling the atmosphere and regional climate over the Arabian Peninsula. Among many applications, he is studying dust storms and ways to mitigate their effect on the efficient harvesting of solar energy. Stenchikov is excited about the potential to accelerate his climate modeling investigations. “Our group was one of the most active users of Shaheen I,” he says. “Shaheen II will allow us to improve our spatial resolution by an order of magnitude, moving environmental research at KAUST to a new horizon.”

The new computer also offers added computing power for faculty members such as seismologist Daniel Peter, who will use computational approaches to build sophisticated maps of our planet’s interior to predict earthquake hazards and locate natural resources, among other quests. “We describe it as a Human Genome Project for the Earth,” says Keyes.

A combustion simulation aids in engine designs that increase fuel efficiency while decreasing pollution. Courtesy of Hong Im.

Professor Hong Im’s laboratory is part of KAUST’s Clean Combustion Research Center and studies fundamental and practical aspects of combustion and power generation devices using high-fidelity computational modeling. With Shaheen II, his group is creating more realistic DNS/LES simulations to aid in engine designs that increase fuel efficiency while decreasing pollution.

When one student moved his custom earthquake analysis code to Shaheen II, he was able to accomplish in two months what took a year or more of processing on Shaheen I. “Time-to-solution is critical for people in a university research environment,” says Saber Feki, Computational Scientist Lead at the KAUST Supercomputing Laboratory. “We have PhD students working on dissertations who do not want to be at university forever. If a student needs 1,000 cores at a time to run analyses, she has them now.”

Much of the research processed by the supercomputer will directly benefit Saudi Arabia. For example, the Solar and Photovoltaics Engineering Research Center is computationally modeling materials for next-generation solar cells to sustainably power the Kingdom. Their efforts are being complemented by those of Stenchikov’s group, which is generating data to address critical environmental issues — including identification of optimal locations for future solar farms. “We’re focused on in-depth analysis of extreme events like flash floods, dust storms, and the combined effects of industrial activities and dust on air quality and climate over the Arabian peninsula,” says Stenchikov.

Soup-to-nuts HPC services

Although the 200,000 cores of the massive Shaheen II are truly impressive, the Supercomputing Lab’s most valuable resource for researchers is its knowledgeable staff. Khurram, Feki, and their colleagues support all facility users from initial training to advanced use of HPC. They install and port software, profile and optimize code, analyze and optimize performance, find and fix bugs, parallelize code, and sometimes make tea.

“We have two types of users: those using third-party, open source, or commercial software packages and those using their own code,” Khurram says. “For both, we help researchers by creating the most optimized compilations possible. We provide training, from parallelization to recommending the correct libraries and profiling tools from Intel and Cray.” For critical applications that use significant Shaheen II resources, the lab assigns a dedicated computational scientist to work closely with scientists and optimize their code.

Using an Intel Architecture system broadens availability to more researchers, since most develop their applications on IA-based workstations, so moving the applications to HPC is much easier,” Khurram says. “The Blue Gene system required much more work to transfer code to HPC. It’s much easier to get people running on Shaheen II.”

In addition to being used by about 200 KAUST researchers, Shaheen II is used by several in-Kingdom organizations, including Saudi Aramco, SABIC, King Saud University, King Abdulaziz University, King Fahd University of Petroleum and Minerals, and King Abdulaziz City for Science and Technology.

KAUST Vice President for Research Jean M. Fréchet shares, “KAUST offers its facilities and outstanding scientific expertise to help strengthen Saudi Arabia’s position as a fast-rising hub for research and innovation. Enabling our researchers and partners with high performance computing resources further enhances our education and research endeavors and supports the KAUST mission to be a destination for those with a passion to make a global impact in science and technology.”

Sign up for our insideHPC Newsletter

About List