12 April 2000

In attendance were Brian Tierney, Dan Gunter, Steve Lau, Wes Bethel (LBL), Joe Grcar, Helen Chen, Jim (?) Brandt (SNL-CA), Bill Lennon and George Pavel (LLNL).

After a bit of effort, we successfully demonstrated Visapult, using some of John Bell's data loaded on the DPSS at LBL, streaming data to CPLANT at SNL-CA where it was partially volume-rendered, with the viewer running on the SGI Octane in Joe Grcar's office. In addition to simply running the app, we also collected data using Netlogger, and had a brief discussion about the performance of all components, including the network, the app & DPSS.

Following are a couple of sections that describe comments/thoughts feedback in more detail. the first is direct feedback from Joe Grcar, followed by a list of design and architectural considerations, followed by a list of current known issues.

Thanks to everyone for making this a success! Special thanks to Joe and helen for hosting the demo.

Feedback from Joe Grcar

Joe was kind enough to make lots of comments about the application, esp those that have bearing upon usability.

Alternate Axis Rendering
The current app does volume rendering along one primary axis. When you rotate the object so that you're looking down the x-axis, say, rather than the z-axis, the IBR volume rendering method does not take into account the new orientation, so you see a few thin slices, not the volume rendered from the new view. Joe feels this feature is the number one feature needed to make the tool useful, so that volumes can be inspected from arbitrary views.
Want more than one variable.
The present app shows one variable only. It would be useful to have the ability to display two, or more, variables simultaneously in multiple windows.
Instructions for running the application.
Joe requested instructions for running the existing application. he understands it's not as simple as "./a.out", and seems willing to brave starting up two jobs on two separate machines.
Need better graphics.
the SGI in Joe's office is a fast machine, but needs more graphics horsepower. specifically, support is needed for hardware-accelerated texture mapping, along with support for multibuffered stereo. He will look into obtaining a graphics upgrade for his machine. We did have some discussions about hardware accelerators for Linux boxes, based upon our experiences here at LBL - this may be a near-term, cost-effective option for them (although support for multibuffered stereo is still questionable in the Linux/COTS-graphics world right now)
Want access to data from upcoming runs.
John Bell is in the process of preparing two runs that are of interest to Joe. One is a "Jet" simulation (Rick Propp?), the other is a "turbulent burnout" simulation. Joe would like to view these data sets when they become available.
Data subsetting.
Cropping/subsetting would be useful, to "zoom on" on regions of interest. Often, only a subset of the problem domain is of interest, and apps should provide a way to look at subsets of data.

Design/Architectural Issues

Network Throughput.
Along the heavy-bandwidth link, between the DPSS and CPLANT, the app is currently not able to fully saturate the network link. The reason for this is that the Visapult back end grabs one time step's worth of data at a time. The sample data used today as 640x256x256 floats per time step, or about 16Mbytes. Divide this by 4 processors/communication channels, and each "heavy payload" grab was on the order of 4Mbytes. By redesigning the back end, we can have it continually fetch, or "pre-fetch" data so that data is always moving from the DPSS to the Visapult back end. Overlapping asynchronous (ie, non-blocking) DPSS reads with rendering will result in a halving of the current duty cycle, which today was about 10 seconds per frame. Keeping the pipe full will improve that to some extent, so we could possibly get this down to maybe 2 or 3 seconds per frame.

Recent tests moving blocks of data between DPSS and CPLANT have shown throughput rates on the order of 350Mbps, or ~43Mbytes/sec, so *in theory*, we should be able to do about three frames per second, end-to-end in Visapult. This seems like a good number to aim for.
Block- rather than slab-decomposition
In order to accomplish Joe's #1 suggestion, we should be performing a block-based decomposition, rather than a slab-based decomposition of the raw simulation data. With a block based decomposition, we will incur less of a penalty for doing non z-axis aligned volume rendering, and we can more fully exploit the IBR technique for doing viewer-side image placement.

Performing the redesign to support block-based decomposition and "pre-fetching" of volume data will be the top priorities for this project in the months to come.

Known Issues

User Interface
The current application consists of a few pieces of raw technology, but there is no nice user interface. We might consider hiring some student help to build a UI for the app.
Dpss_cat
Our efforts to create a workable "DPSS concatenate" tool have thus far not produced usable results. The existing dpss_cat appears to not work, and the one Wes coded up the other day does something bad to the DPSS. This problem is under investigation.
Netlogger code inside Visapult
The Netlogger interface code inside Visapult is old (circa SC99) and needs to be reworked.
nlv Displays are erratic.
The netlogger logfile we generated today doesn't always display correctly under nlv.
Which DPSS libs on CPLANT?
One of the false starts we had this morning stemmed from the fact the Visapult back end was linked against "the wrong" DPSS client libraries on CPLANT. After relinking with "the right" libraries, the program began to function normally.
Host/portnum problem?
We saw an error that appears to be due to a host/port mismatch between CPLANT (Visapult backend) and the Visapult viewer. We saw this problem yesterday while testing to a viewer running at LBL, but attributed the problem to a firewall issue at SNL-CA. SInce we saw the problem today inside SNL, and since the problem only appears when using CPLANT, more investigation is needed. Wes agreed to write a simple test program to reproduce the problem, which will be passed on to Helen et. al. at SNL.

Campaignlet Application Data

The following two images show the results of instrumenting Visapult with Netlogger. The first image shows the results of a DPSS->CPLANT->LBL configuration, while the second image shows the results of a DPSS->CPLANT->SNL-CA configuration.

	NetLogger results running from LBL DPSS to CPlant to a viewer at LBL.
	NetLogger results collected on April 12, running from LBL DPSS to CPlant to a viewer at SNL.

Analysis of NetLogger Data

The sample data consists of data from a grid that is 640x256x256 by 265 time steps. Each time step's worth of data consists of 640x256x256 IEEE big-endian floats, or 160 MBytes of data per time step, and an aggregate of 42 GBytes of data.

Each of the images above shows profile traces for both the Visapult back end and viewer. The viewer traces are in green, the back end traces are in red. The horizontal axis is time, and the vertical axis are checkpoints within the code.

Referring to the bottom image (data taken from the April 12 campaignlet), we can see the total elapsed time for a single frame of data is about 15 seconds. The time required to load the data from the DPSS into the back end over NTON is about 5 seconds, for an aggregate bandwidth of about 256 Mbits/sec.

The architectural improvement that immediately jumps out from this image is that if network i/o could be overlapped with rendering, the duty cycle, or throughput rate, of the Visapult back end would effectively double.