From: Randall Frank <frank12@llnl

From: Randall Frank <frank12@llnl.gov>

Date: Fri Aug 29, 2003 3:43:50 PM US/Pacific

To: John Shalf <jshalf@lbl.gov>, diva@lbl.gov

Subject: Re: DiVA Survey (Please return by Sept 10!)

My $0.02: I will preface by noting that this is IMHO...

=============The Survey=========================

1) Data Structures/Representations/Management==================

The center of every successful modular visualization architecture has been a flexible core set of data structures for representing data that is important to the targeted application domain. Before we can begin working on algorithms, we must come to some agreement on common methods (either data structures or accessors/method calls) for exchanging data between components of our vis framework.

There are two potentially disparate motivations for defining the data representation requirements. In the coarse-grained case, we need to define standards for exchanging data between components in this framework (interoperability). In the fined-grained case, we want to define some canonical data structures that can be used within a component -- one developed specifically for this framework. These two use-cases may drive different set of requirements and implementation issues.

* Do you feel both of these use cases are equally important or should we focus exclusively on one or the other?

I think that interoperability (both in terms of data and perhaps more

critically operation/interaction) is more critical than fine-grained

data sharing. My motivation: there is no way that DiVA will be able to

meet all needs initially and in many cases, it may be fine for data to

go "opaque" to the framework once inside a "limb" in the framework (e.g.

VTK could be a limb). This allows the framework to be easily populated

with a lot of solid code bases and shifts the initial focus on important

interactions (perhaps domain centric). Over time, I see the fine-grain

stuff coming up, but perhaps proposed by the "limbs" rather than the

framework. I do feel that the coarse level must take into account

distributed processing however...

* Do you feel the requirements for each of these use-cases are aligned or will they involve two separate development tracks? For instance, using "accessors" (method calls that provide abstract access to essentially opaque data structures) will likely work fine for the coarse-grained data exchanges between components, but will lead to inefficiencies if used to implement algorithms within a particular component.

I think you hit the nail on the head. Where necessary, I see sub-portions

of the framework working out the necessary fine-grained, efficient,

"aware" interactions and datastuctures as needed. I strongly doubt we

would get that part right initially and think it would lead to some of

the same constraints that are forcing us to re-invent frameworks right

now. IMHO: the fine-grain stuff must be flexible and dynamic over

time as development and research progress.

* As you answer the "implementation and requirements" questions below, please try to identify where coarse-grained and fine-grained use cases will affect the implementation requirements.

What are requirements for the data representations that must be supported by a common infrastructure. We will start by answering Pat's questions of about representation requirements and follow up with personal experiences involving particular domain scientist's requirements.

Must: support for structured data

Must-at the coarse level, I think this could form the basis of all

other representations.

Must/Want: support for multi-block data?

Must-at the coarse level, I think this is key for scalability,

domain decomposition and streaming/multpart data transfer.

Must/Want: support for various unstructured data representations? (which ones?)

Nice-but I would be willing to live with an implementation on top

of structured, multi-block (e.g. Exdous). I feel accessors are

fine for this at the "framework" level (not at the leaves).

Must/Want: support for adaptive grid standards? Please be specific about which adaptive grid methods you are referring to. Restricted block-structured AMR (aligned grids), general block-structured AMR (rotated grids), hierarchical unstructured AMR, or non-hierarchical adaptive structured/unstructured meshes.

Similar to my comments on unstructured data reps. In the long

run, something like boxlib with support for both P and H adaptivity

will be needed (IMHO, VTK might provide this).

Must/Want: "vertex-centered" data, "cell-centered" data? other-centered?

Must.

Must: support time-varying data, sequenced, streamed data?

Must, but way too much to say here to do it justice. I will say

that the core must deal with time-varying/sequenced data. Streaming

might be able to be placed on top of that, if it is designed

properly. I will add that we have a need for progressive data as

well.

Must/Want: higher-order elements?

Must - but again, this can often be "faked" on top of other reps.

Must/Want: Expression of material interface boundaries and other special-treatment of boundary conditions.

Must, but I will break this into two cases. Material interfaces for

us are essentially sparse vectors sets, so they can be handled with

basic mechanisms so I do not see that as core, other than perhaps

support for compression. Boundary conditions (e.g. ghost zoning,

AMR boundaries, etc) are critical.

* For commonly understood datatypes like structured and unstructured, please focus on any features that are commonly overlooked in typical implementations. For example, often data-centering is overlooked in structured data representations in vis systems and FEM researchers commonly criticize vis people for co-mingling geometry with topology for unstructured grid representations. Few datastructures provide proper treatment of boundary conditions or material interfaces. Please describe your personal experience on these matters.

Make sure you get the lowest common denominator correct! There is

no realistic way that the framework can support everything, everywhere

without losing its ability to be nimble (no matter what some OOP folks

say). Simplicity ore representation with "externally" supplied optional

optimization information is one approach to this kind of problem.

* Please describe data representation requirements for novel data representations such as bioinformatics and terrestrial sensor datasets. In particular, how should we handle more abstract data that is typically given the moniker "information visualization".

Obviously, do not forget "records" and aggregate/derived types. That

having been said, the overheads for these can be ugly. Consider

parallel arrays as an alternative...

What do you consider the most elegant/comprehensive implementation for data representations that you believe could form the basis for a comprehensive visualization framework?

IMHO: layered data structuring combined with data accessors is

probably the right way to go. Keep the basic representational

elements simple.

* For instance, AVS uses entirely different datastructures for structure, unstructured and geometry data. VTK uses class inheritance to express the similarities between related structures. Ensight treats unstructured data and geometry nearly interchangably. OpenDX uses more vector-bundle-like constructs to provide a more unified view of disparate data structures. FM uses data-accessors (essentially keeping the data structures opaque).

* Are there any of the requirements above that are not covered by the structure you propose?

I think one big issue will be the distributed representations.

This item is ill handled by many of these systems.

* This should focus on the elegance/usefulness of the core design-pattern employed by the implementation rather than a point-by-point description of the implemenation!

Is it possible to consider a COM Automation Object-like approach,

also similar to the CCA breakdown. Basically, define the common

stuff and make it interchangable then build on top. Allow underlying

objects to be "aware" and wink to each other to bypass as needed.

In the long run, consider standardizing on working bypass paradigms

and bring them into the code (e.g. OpenGL).

* Is there information or characteristics of particular file format standards that must percolate up into the specific implementation of the in-memory data structures?

Not really, but metadata handling and referencing will be key and need

to be general.

For the purpose of this survey, "data analysis" is defined broadly as all non-visual data processing done *after* the simulation code has finished and *before* "visual analysis".

* Is there a clear dividing line between "data analysis" and "visual analysis" requirements?

Not in my opinion.

* Can we (should we) incorporate data analysis functionality into this framework, or is it just focused on visual analysis.

Yes and we should, particularly as the complexity and size of

data grows, we begin to rely more heavily on "data analysis" based

visualization.

* What kinds of data analysis typically needs to be done in your field? Please give examples and how these functions are currently implemented.

Obviously basic statistics (e.g. moments, limits, etc). Regression

and model driven analysis are common. For example, comparison of

data/fields via comparison vs common distance maps. Prediction of

activation "outliers" via general linear models applied on an

element by element basis, streaming through temporal data windows.

* How do we incorporate powerful data analysis functionality into the framework?

Hard work :), include support for meta-data, consider support for

sparse data representations and include the necessary support for

"windowing" concepts.

2) Execution Model=======================

It will be necessary for us to agree on a common execution semantics for our components. Otherwise, while we might have compatible data structures but incompatible execution requirements. Execution semantics is akin to the function of protocol in the context of network serialization of data structures. The motivating questions are as follows;

* How is the execution model affected by the kinds of algorithms/system-behaviors we want to implement.

* How then will a given execution model affect data structure implementations

* How will the execution model be translated into execution semantics on the component level. For example will we need to implement special control-ports on our components to implement particular execution models or will the semantics be implicit in the way we structure the method calls between components.

What kinds of execution models should be supported by the distributed visualization architecture

* View dependent algorithms? (These were typically quite difficult to implement for dataflow visualization environments like AVS5).

I propose limited enforcement of fixed execution semantics. View/data/focus

dependent environments are common and need to be supported, however, they

are still tied very closely with data representations, hence will likely

need to be customized to application domains/functions.

* Out-of-core algorithms

This has to be a feature, given the focus on large data.

* Progressive update and hierarchical/multiresolution algorithms?

Obviously, I have a bias here, particularly in the remote visualization

cases. Remote implies fluctuations in effective data latency that make

progressive systems key.

* Procedural execution from a single thread of control (ie. using an commandline language like IDL to interactively control an dynamic or large parallel back-end)

Yep, I think this kind of control is key.

* Dataflow execution models? What is the firing method that should be employed for a dataflow pipeline? Do you need a central executive like AVS/OpenDX or, completely distributed firing mechanism like that of VTK, or some sort of abstraction that allows the modules to be used with either executive paradigm?

I think this should be an option as it can ease some connection

mechanisms, but it should not be the sole mechanism. Personally,

I find a properly designed central executive making "global"

decisions coupled with demand/pull driven local "pipelets" that

allow high levels of abstraction more useful (see the VisIt model).

* Support for novel data layouts like space-filling curves?

With the right accessors nothing special needs to be added for these.

* Are there special considerations for collaborative applications?

* What else?

The kitchen sink? :)

How will the execution model affect our implementation of data structures?

* how do you decompose a data structure such that it is amenable to streaming in small chunks?

This is a major issue and relates to things like out-of-core/etc.

I definitely feel that "chunking" like mechanisms need to be in

the core interfaces.

* how do you represent temporal dependencies in that model?

I need to give this more thought, there are a lot of options.

* how do you minimize recomputation in order to regenerate data for view-dependent algorithms.

Framework invisible caching. Not a major Framework issue.

What are the execution semantics necessary to implement these execution models?

* how does a component know when to compute new data? (what is the firing rule)

Explicit function calls with potential async operation. A higher-level

wrapper can make this look like "dataflow".

* does coordination of the component execution require a central executive or can it be implemented using only rules that are local to a particular component.

I think the central executive can be an optional component (again, see

VisIt).

* how elegantly can execution models be supported by the proposed execution semantics? Are there some things, like loops or back-propagation of information that are difficult to implement using a particular execution semantics?

There will always be warts...

How will security considerations affect the execution model?

Security issues tend to impact two areas: 1) effective bandwidth/latency

and 2) dynamic connection problems. 1) can be unavoidable, but will not

show up in most environments if we design properly. 2) is a real problem

with few silver bullets.

3) Parallelism and load-balancing=================

Thus far, managing parallelism in visualization systems has been a tedious and difficult at best. Part of this is a lack of powerful abstractions for managing data-parallelism, load-balancing and component control.

Please describe the kinds of parallel execution models that must be supported by a visualization component architecture.

* data-parallel/dataflow pipelines?

* master/slave work-queues?

I tend to use small dataflow pipelines locally and higher-level

async streaming work-queue models globally.

* streaming update for management of pipeline parallelism?

Yes, we use this, but it often requires a global parallel filesystem to

be most effective.

* chunking mechanisms where the number of chunks may be different from the number of CPU's employed to process those chunks?

We use spacefilling curves to reduce the overall expense of this

(common) operation (consider the compute/viz impedance mismatch

problem as well). As a side effect, the codes gain cache coherency

as well.

* how should one manage parallelism for interactive scripting languages that have a single thread of control? (eg. I'm using a commandline language like IDL that interactively drives an arbitrarily large set of parallel resources. How can I make the parallel back-end available to a single-threaded interactive thread of control?)

Consider them as "scripting languages", and have most operations

run through an executive (note the executive would not be aware

of all component operations/interactions, it is a higher-level

executive). Leave RPC style hooks for specific references.

Please describe your vision of what kinds of software support / programming design patterns are needed to better support parallelism and load balancing.

* What programming model should be employed to express parallelism.

(UPC, MPI, SMP/OpenMP, custom sockets?)

The programming model must transcend specific parallel APIs.

* Can you give some examples of frameworks or design patterns that you consider very promising for support of parallelism and load balancing.

(ie. PNNL Global Arrays or Sandia's Zoltan)

http://www.cs.sandia.gov/Zoltan/

http://www.emsl.pnl.gov/docs/global/ga.html

no I cannot (am not up to speed).

* Should we use novel software abstractions for expressing parallelism or should the implementation of parallelism simply be an opaque property of the component? (ie. should there be an abstract messaging layer or not)

I would vote no as it will allow known paradigms to work, but will

interfere with research and new direction integration. I think some

kind of basic message abstraction (outside of the parallel data system)

is needed.

* How does the NxM work fit in to all of this? Is it sufficiently differentiated from Zoltan's capabilities?

Unable to comment...

===============End of Mandatory Section (the rest is voluntary)=============

4) Graphics and Rendering=================

What do you use for converting geometry and data into images (the rendering-engine). Please comment on any/all of the following.

* Should we build modules around declarative/streaming methods for rendering geometry like OpenGL, Chromium and DirectX or should we move to higher-level representations for graphics offered by scene graphs?

IMHO, the key is defining the boundary and interoperability constraints.

If these can be documented, then the question becomes moot, you can

use whatever works best for the job.

What are the pitfalls of building our component architecture around scene graphs?

Data cloning, data locking and good support for streaming, view dependent,

progressive systems.

* What about Postscript, PDF and other scale-free output methods for publication quality graphics? Are pixmaps sufficient?

Gotta make nice graphs. Pixmaps will not suffice.

In a distributed environment, we need to create a rendering subsystem that can flexibly switch between drawing to a client application by sending images, sending geometry, or sending geometry fragments (image-based rendering)? How do we do that?

See the Chromium approach. This is actually more easily done than

one might think. Define an image "fragment" and augment the rendering

pipeline to handle it (ref: PICA and Chromium).

* Please describe some rendering models that you would like to see supported (ie. view-dependent update, progressive update) and how they would adjust dynamically do changing objective functions (optimize for fastest framerate, or fastest update on geometry change, or varying workloads and resource constraints).

See the TeraScale browser system.

* Are there any good examples of such a system?

None that are ideal :), but they are not difficult to build.

What is the role of non-polygonal methods for rendering (ie. shaders)?

* Are you using any of the latest gaming features of commodity cards in your visualization systems today?

* Do you see this changing in the future? (how?)

This is a big problem area. Shaders are difficult to combine/pipeline.

We are using this stuff now and I do not see it getting much easier

(hlsl does not fix it). At some point, I believe that non-polygon

methods will become more common that polygon methods (about 3-4 years?).

Poylgons are a major bottleneck on current gfx cards as they limit

parallelism. I'm not sure what the fix will be but it will still be

called OpenGL :).

5) Presentation=========================

It will be necessary to separate the visualization back-end from the presentation interface. For instance, you may want to have the same back-end driven by entirely different control-panels/GUIs and displayed in different display devices (a CAVE vs. a desktop machine). Such separation is also useful when you want to provide different implementations of the user-interface depending on the targeted user community. For instance, visualization experts might desire a dataflow-like interface for composing visualization workflows whereas a scientists might desire a domain-specific dash-board like interface that implements a specific workflow. Both users should be able to share the same back-end components and implementation even though the user interface differs considerably.

How do different presentation devices affect the component model?

* Do different display devices require completely different user interface paradigms? If so, then we must define a clear separation between the GUI description and the components performing the back-end computations. If not, then is there a common language to describe user interfaces that can be used across platforms?

I think they do (e.g. immersion).

* Do different display modalities require completely different component/algorithm implementations for the back-end compute engine?

(what do we do about that??)

They can (e.g. holography), but I do not see a big problem there.

Push the representation through an abstraction (not a layer).

What Presentation modalities do you feel are important and what do you consider the most important.

* Desktop graphics (native applications on Windows, on Macs)

#1 (by a fair margin)

* Graphics access via Virtual Machines like Java?

* CAVEs, Immersadesks, and other VR devices

* Ultra-high-res/Tiled display devices?

#3 - note that tiling applies to desktop systems as well, not

necessarily high-pixel count displays.

* Web-based applications?

What abstractions do you think should be employed to separate the presentation interface from the back-end compute engine?

* Should we be using CCA to define the communication between GUI and compute engine or should we be using software infrastructure that was designed specifically for that space? (ie. WSDL, OGSA, or CORBA?)

No strong opinion.

* How do such control interfaces work with parallel applications?

Should the parallel application have a single process that manages the control interface and broadcasts to all nodes or should the control interface treat all application processes within a given component as peers?

Consider DMX, by default, single w/broadcast, but it supports

backend bypass...

6) Basic Deployment/Development Environment Issues============

One of the goals of the distributed visualization architecture is seamless operation on the Grid -- distributed/heterogeneous collections of machines. However, it is quite difficult to realize such a vision without some consideration of deployment/portability issues. This question also touches on issues related to the development environment and what kinds of development methods should be supported.

What languages do you use for core vis algorithms and frameworks.

* for the numerically intensive parts of vis algorithms

C/C++ (a tiny amount of Fortran)

* for the glue that connects your vis algorithms together into an application?

C/C++

* How aggressively do you use language-specific features like C++ templates?

Not very, but they are used.

* is Fortran important to you? Is it important that a framework support it seamlessly?

Pretty important, but at least "standardly" enhanced F77 should be simple :).

* Do you see other languages becoming important for visualization (ie. Python, UPC, or even BASIC?)

Python is big for us.

What platforms are used for data analysis/visualization?

* What do you and your target users depend on to display results? (ie. Windows, Linux, SGI, Sun etc..)

Linux, SGI, Sun, Windows, MacOS in that order

* What kinds of presentation devices are employed (desktops, portables, handhelds, CAVEs, Access Grids, WebPages/Collaboratories) and what is their relative importance to active users.

Remote desktops and laptops. Very important

* What is the relative importants of these various presentation methods from a research standpoint?

PowerPoint :)?

* Do you see other up-and-coming visualization platforms in the future?

Tablets & set-top boxes.

Tell us how you deal with the issue of versioning and library dependencies for software deployment.

* For source code distributions, do you bundle builds of all related libraries with each software release (ie. bundle HDF5 and FLTK source with each release).

For many libs, yes.

* What methods are employed to support platform independent builds (cmake, imake, autoconf). What are the benefits and problems with this approach.

gmake based makefiles.

* For binaries, have you have issues with different versions of libraries (ie. GLIBC problems on Linux and different JVM implemetnations/version for Java). Can you tell us about any sophisticated packaging methods that address some of these problems (RPM need not apply)

No real problems other that GLIBC problems. We do tend to ship static

for several libs. Motif used to be a problem on Linux (LessTiff vs

OpenMotif).

* How do you handle multiplatform builds?

cron jobs on multiple platforms, directly from CVS repos. Entire

environment can be built from CVS repo info (or cached).

How do you (or would you) provide abstractions that hide the locality of various components of your visualization/data analysis application?

* Does anyone have ample experience with CORBA, OGSA, DCOM, .NET, RPC? Please comment on advantages/problems of these technologies.

* Do web/grid services come into play here?

Not usually an issue for us.

7) Collaboration ==========================

If you are interested in "collaborative appllications" please define the term "collaborative". Perhaps provide examples of collaborative application paradigms.

Meeting Maker? :) :) (I'm getting tired).

Is collaboration a feature that exists at an application level or are there key requirements for collaborative applications that necessitate component-level support?

* Should collaborative infrastructure be incorporated as a core feature of very component?

* Can any conceivable collaborative requirement be satisfied using a separate set of modules that specifically manage distribution of events and data in collaborative applications?

* How is the collaborative application presented? Does the application only need to be collaborative sometimes?

* Where does performance come in to play? Does the visualization system or underlying libraries need to be performance-aware? (i.e. I'm doing a given task and I need a framerate of X for it to be useful using my current compute resources), network aware (i.e. the system is starving for data and must respond by adding an alternate stream or redeploying the pipeline). Are these considerations implemented at the component level, framework level, or are they entirely out-of-scope for our consideration?

And I'm at the end... Whew.

rjf.

Randall Frank | Email: rjfrank@llnl.gov

Lawrence Livermore National Laboratory | Office: B451 R2039

7000 East Avenue, Mailstop:L-561 | Voice: (925) 423-9399

Livermore, CA 94550 | Fax: (925) 423-8704