This is an archival copy of the Visualization Group's web page 1998 to 2017. For current information, please vist our group's new web page.

Smashing the Trillion Cell Barrier

Isocontouring of two trillion cells on 32,000 cores of Franklin. Volume rendering of two trillion cells on 32,000 cores of Franklin.

Personnel from the LBL visualization group were part of a team of DOE researchers to run a series of experiments that demonstrated that VisIt, a leading scientific visualization application, is up to the challenge of visualizing massive data sets. Running on some of the world's most powerful supercomputers, VisIt achieved unprecedented levels of performance in highly parallel environments, tackling some of the largest data sets ever produced.

The team ran VisIt using 8,000 to 64,000 processing cores to tackle datasets ranging from 500 billion to 4 trillion cells, or grid points. The project was a collaboration among leading visualization researchers from Lawrence Berkeley National Laboratory (Berkeley Lab), Lawrence Livermore National Laboratory (LLNL) and Oak Ridge National Laboratory (ORNL).

Specifically, the team demonstrated, for the first time, that VisIt's parallelism approach can take advantage of the growing number of cores on today's advanced supercomputers, using them to process unprecedentedly large problems. Scientists confronted with massive datasets rely on data analysis and visualization software such as VisIt to "get the science out of the data," as one researcher said. VisIt, a parallel visualization and analysis tool that won an R&D 100 award in 2005, was developed at LLNL for the National Nuclear Security Administration.

When DOE established the Visualization and Analytics Center for Enabling Technologies (VACET) in 2006, the center joined the VisIt development effort, making further extensions for use on the large, complex datasets for DOE stakeholders. VACET is part of DOE's Scientific Discovery through Advanced Computing (SciDAC) program and includes researchers from three national laboratories, Berkeley Lab, LLNL and ORNL, and two universities, the University of California at Davis and the University of Utah.

The VACET team conducted the recent capability experiments in response to its mission to provide production-quality, parallel-capable visual data analysis software. These tests were a significant milestone for DOE's visualization efforts, providing an important new capability for the larger scientific research communities.

"The results show that visualization research and development efforts have produced technology that is today capable of ingesting and processing tomorrow's datasets," said Berkeley Lab's E. Wes Bethel, who is co-leader of VACET. "These results are the largest-ever problem sizes and the largest degree of concurrency ever attempted within the DOE visualization research community."

Experiment details

To run these tests, the VACET team started with data from an astrophysics simulation, and then increased it to create a sample scientific dataset at the desired dimensions. The team used this approach because the data sizes reflect tomorrow's problem sizes, and because the primary objective of these experiments is to better understand problems and limitations that might be encountered at extreme levels of concurrency and data size.

The test runs created three-dimensional grids ranging from 512 x 512 x 512 "cells" or grid points up to approximately 10,000 x 10,000 x 10,000 (1 trillion grid points) and approximately 15,900 x 15,900 x 15,900 to achieve 4 trillion grid points.

"This level of grid resolution, while uncommon today, is anticipated to be commonplace in the near future," said Sean Ahern, visualization lead at ORNL. "A primary objective for our SciDAC Center is to be well prepared to tackle future scientific data understanding challenges."

The VACET team ran the experiments in April and May on several world-class supercomputers:

Franklin, a 38,128-core Cray XT4 located at the National Energy Research Scientific Computing Center at Berkeley Lab;
JaguarPF, a 149,504-core Cray XT5 at the Oak Ridge Leadership Computing Facility at ORNL;
Ranger, a 62,976-core x86_64 Linux system at the Texas Advanced Computing Center at the University of Texas at Austin;
Purple, a 12,288-core IBM Power5 at LLNL;
Juno, an 18,432-core x86_64 Linux system at LLNL; and
Dawn, an 147,456-core BG/P system at LLNL.

Berkeley Lab team members were Mark Howison and Prabhat, who did the Franklin run, and Hank Childs, who did the Ranger, Purple, and Juno runs.

The experiments ran VisIt in parallel on 8,000 to 64,000 cores, depending on the size of the system. Data was loaded in parallel, with the application performing two common visualization tasks-isosurfacing and volume rendering-and producing an image. From these experiments, the team collected performance data that will help them both to identify potential bottlenecks and to optimize VisIt before the next major version is released for general production use at supercomputing centers later this year.

Issues Discovered During Scaling Study

Although the majority of VisIt's infrastructure scaled well to a large number of cores, the team uncovered several scaling issues:

VisIt's MPI rank 0 was collecting status information from all other processors through point-to-point communications, which is non-scalable. We worked around this issue for our study and plan on implementing a complete solution for a future release of VisIt.
VisIt's volume rendering algorithm was attempting an optimization for sample point communication that required a buffer that had an O(nProcs^2) space requirement. This "optimization," while appropriate for low levels of concurrency, proved to cause problems at high levels of concurrency. The workaround for these tests was to remove this "optimization." The team is evaluating the best course of action for a solution for a future release of VisIt.
The loading of shared libraries took quite a long time at scale (as much as five minutes) because most HPC platforms are only designed to load static binaries efficiently. VisIt's plugin model may need to adapt for this case, likely by switching to precompiled plugins.

Lessons Learned

The primary objective for these experiments was to gain a better understanding of functional and performance limits when running visual data analysis applications at extreme levels of concurrency and problem sizes. The team encountered a couple of minor problems that will be rectified and appear in a future public VisIt release.

From a visual data analysis perspective, these problem sizes and concurrency levels are a "first." The successful completion of these functional and performance tests show progress towards petascale computing by demonstrating that today's technology is capable of ingesting and processing tomorrow's datasets.

The performance data the team collected during the experiments reveals insights into potential bottlenecks and opportunities for performance optimization on different machine architectures at high levels of concurrency and ultrascale datasets. Future work will include a more detailed, end-to-end performance study of several different visualization algorithms to better understand performance limits and opportunities for VisIt, a production-quality visual data analysis software application.

Collaborators

DOE's Scientific Discovery through Advanced Computing Program (SciDAC)
Brad Whitlock, LLNL (Dawn runs)
Dave Pugmire and Sean Ahern, ORNL (Jaguar runs)
Kathy Yelick, Francesca Verdier, and Howard Walter. National Energy Research Scientific Computing Center (NERSC), Berkeley Lab
Paul Navratil, Kelly Gaither, and Karl Schulz, Texas Advanced Computing Center, University of Texas, Austin
James Hack, Doug Kothe, Arthur Bland, Ricky Kendall, Oak Ridge Leadership Computing Facility, ORNL.
David Fox, Debbie Santa Maria, Brian Carnes, Livermore Computing, LLNL.


Isocontouring of two trillion cells on 32,000 cores of Franklin.	Volume rendering of two trillion cells on 32,000 cores of Franklin.