From: Randall Frank
<frank12@llnl.gov>
Date: Fri Aug 29,
2003 3:43:50 PM US/Pacific
To: John Shalf
<jshalf@lbl.gov>, diva@lbl.gov
Subject: Re: DiVA Survey
(Please return by Sept 10!)
My $0.02: I will preface by noting that this
is IMHO...
=============The
Survey=========================
1) Data
Structures/Representations/Management==================
The center of every successful modular
visualization architecture has been a flexible core set of data structures for
representing data that is important to the targeted application domain. Before we can begin working on
algorithms, we must come to some agreement on common methods (either data
structures or accessors/method
calls) for exchanging data between components of our vis framework.
There are two potentially disparate
motivations for defining the data representation requirements. In the coarse-grained case, we need to
define standards for exchanging data between components in this framework
(interoperability). In the
fined-grained case, we want to define some canonical data structures that can
be used within a component -- one developed specifically for this
framework. These two use-cases may
drive different set of requirements and implementation issues.
* Do you feel both of
these use cases are equally important or should we focus exclusively on one or
the other?
I think that interoperability (both in terms
of data and perhaps more
critically operation/interaction) is more
critical than fine-grained
data sharing. My motivation: there is no way that DiVA will be able to
meet all needs initially and in many cases,
it may be fine for data to
go "opaque" to the framework once
inside a "limb" in the framework (e.g.
VTK could be a limb). This allows the framework to be easily
populated
with a lot of solid code bases and shifts
the initial focus on important
interactions (perhaps domain centric). Over time, I see the fine-grain
stuff coming up, but perhaps proposed by the
"limbs" rather than the
framework. I do feel that the coarse level must take into account
distributed processing however...
* Do you feel the
requirements for each of these use-cases are aligned or will they involve two
separate development tracks? For
instance, using "accessors" (method calls that provide abstract access
to essentially opaque data structures) will likely work fine for the
coarse-grained data exchanges between components, but will lead to
inefficiencies if used to implement algorithms within a particular component.
I think you hit the nail on the head. Where necessary, I see sub-portions
of the framework working out the necessary
fine-grained, efficient,
"aware" interactions and
datastuctures as needed. I
strongly doubt we
would get that part right initially and
think it would lead to some of
the same constraints that are forcing us to
re-invent frameworks right
now.
IMHO: the fine-grain stuff must be flexible and dynamic over
time as development and research progress.
* As you answer the
"implementation and requirements" questions below, please try to
identify where coarse-grained and fine-grained use cases will affect the
implementation requirements.
What are requirements for the data
representations that must be supported by a common infrastructure. We will start by answering Pat's questions
of about representation requirements and follow up with personal experiences
involving particular domain scientist's requirements.
Must: support for
structured data
Must-at the coarse level, I think this could
form the basis of all
other representations.
Must/Want: support for
multi-block data?
Must-at the coarse level, I think this is
key for scalability,
domain decomposition and streaming/multpart
data transfer.
Must/Want: support for
various unstructured data representations? (which ones?)
Nice-but I would be willing to live with an
implementation on top
of structured, multi-block (e.g.
Exdous). I feel accessors are
fine for this at the "framework"
level (not at the leaves).
Must/Want: support for
adaptive grid standards? Please be
specific about which adaptive grid methods you are referring to. Restricted block-structured AMR
(aligned grids), general block-structured AMR (rotated grids), hierarchical
unstructured AMR, or non-hierarchical adaptive structured/unstructured meshes.
Similar to my comments on unstructured data
reps. In the long
run, something like boxlib with support for
both P and H adaptivity
will be needed (IMHO, VTK might provide
this).
Must/Want:
"vertex-centered" data, "cell-centered" data?
other-centered?
Must.
Must: support
time-varying data, sequenced, streamed data?
Must, but way too much to say here to do it
justice. I will say
that the core must deal with
time-varying/sequenced data.
Streaming
might be able to be placed on top of that,
if it is designed
properly. I will add that we have a need for progressive data as
well.
Must/Want: higher-order
elements?
Must - but again, this can often be
"faked" on top of other reps.
Must/Want: Expression of
material interface boundaries and other special-treatment of boundary
conditions.
Must, but I will break this into two
cases. Material interfaces for
us are essentially sparse vectors sets, so
they can be handled with
basic mechanisms so I do not see that as
core, other than perhaps
support for compression. Boundary conditions (e.g. ghost zoning,
AMR boundaries, etc) are critical.
* For commonly
understood datatypes like structured and unstructured, please focus on any
features that are commonly overlooked in typical implementations. For example, often data-centering is
overlooked in structured data representations in vis systems and FEM
researchers commonly criticize vis people for co-mingling geometry with
topology for unstructured grid representations. Few datastructures provide proper treatment of boundary
conditions or material interfaces.
Please describe your personal experience on these matters.
Make sure you get the lowest common
denominator correct! There is
no realistic way that the framework can
support everything, everywhere
without losing its ability to be nimble (no
matter what some OOP folks
say).
Simplicity ore representation with "externally" supplied
optional
optimization information is one approach to
this kind of problem.
* Please describe data
representation requirements for novel data representations such as
bioinformatics and terrestrial sensor datasets. In particular, how should we handle more abstract data that
is typically given the moniker "information visualization".
Obviously, do not forget "records"
and aggregate/derived types. That
having been said, the overheads for these
can be ugly. Consider
parallel arrays as an alternative...
What do you consider the most
elegant/comprehensive implementation for data representations that you believe
could form the basis for a comprehensive visualization framework?
IMHO: layered data structuring combined with
data accessors is
probably the right way to go. Keep the basic representational
elements simple.
* For instance, AVS uses entirely different
datastructures for structure, unstructured and geometry data. VTK uses class inheritance to express
the similarities between related structures. Ensight treats unstructured data and geometry nearly
interchangably. OpenDX uses more
vector-bundle-like constructs to provide a more unified view of disparate data
structures. FM uses data-accessors
(essentially keeping the data structures opaque).
* Are there any of the
requirements above that are not covered by the structure you propose?
I think one big issue will be the
distributed representations.
This item is ill handled by many of these
systems.
* This should focus on
the elegance/usefulness of the core design-pattern employed by the implementation
rather than a point-by-point description of the implemenation!
Is it possible to consider a COM Automation
Object-like approach,
also similar to the CCA breakdown. Basically, define the common
stuff and make it interchangable then build
on top. Allow underlying
objects to be "aware" and wink to
each other to bypass as needed.
In the long run, consider standardizing on
working bypass paradigms
and bring them into the code (e.g. OpenGL).
* Is there information
or characteristics of particular file format standards that must percolate up
into the specific implementation of the in-memory data structures?
Not really, but metadata handling and
referencing will be key and need
to be general.
For the purpose of this survey, "data
analysis" is defined broadly as all non-visual data processing done
*after* the simulation code has finished and *before* "visual
analysis".
* Is there a clear
dividing line between "data analysis" and "visual analysis"
requirements?
Not in my opinion.
* Can we (should we) incorporate data analysis
functionality into this framework, or is it just focused on visual analysis.
Yes and we should, particularly as the
complexity and size of
data grows, we begin to rely more heavily on
"data analysis" based
visualization.
* What kinds of data
analysis typically needs to be done in your field? Please give examples and how these functions are currently
implemented.
Obviously basic statistics (e.g. moments,
limits, etc). Regression
and model driven analysis are common. For example, comparison of
data/fields via comparison vs common
distance maps. Prediction of
activation "outliers" via general
linear models applied on an
element by element basis, streaming through
temporal data windows.
* How do we incorporate
powerful data analysis functionality into the framework?
Hard work :), include support for meta-data,
consider support for
sparse data representations and include the
necessary support for
"windowing" concepts.
2) Execution Model=======================
It will be necessary for us to agree on a
common execution semantics for our components. Otherwise, while we might have compatible data structures
but incompatible execution requirements.
Execution semantics is akin to the function of protocol in the context
of network serialization of data structures. The motivating questions are as follows;
* How is the execution
model affected by the kinds of algorithms/system-behaviors we want to
implement.
* How then will a given
execution model affect data structure implementations
* How will the execution
model be translated into execution semantics on the component level. For example will we need to implement
special control-ports on our components to implement particular execution
models or will the semantics be implicit in the way we structure the method
calls between components.
What kinds of execution models should be
supported by the distributed visualization architecture
* View dependent
algorithms? (These were typically quite difficult to implement for dataflow
visualization environments like AVS5).
I propose limited enforcement of fixed
execution semantics. View/data/focus
dependent environments are common and need
to be supported, however, they
are still tied very closely with data
representations, hence will likely
need to be customized to application
domains/functions.
* Out-of-core algorithms
This has to be a feature, given the focus on
large data.
* Progressive update and
hierarchical/multiresolution algorithms?
Obviously, I have a bias here, particularly
in the remote visualization
cases.
Remote implies fluctuations in effective data latency that make
progressive systems key.
* Procedural execution
from a single thread of control (ie. using an commandline language like IDL to
interactively control an dynamic or large parallel back-end)
Yep, I think this kind of control is key.
* Dataflow execution
models? What is the firing method
that should be employed for a dataflow pipeline? Do you need a central executive like AVS/OpenDX or,
completely distributed firing mechanism like that of VTK, or some sort of
abstraction that allows the modules to be used with either executive paradigm?
I think this should be an option as it can
ease some connection
mechanisms, but it should not be the sole
mechanism. Personally,
I find a properly designed central executive
making "global"
decisions coupled with demand/pull driven
local "pipelets" that
allow high levels of abstraction more useful
(see the VisIt model).
* Support for novel data
layouts like space-filling curves?
With the right accessors nothing special
needs to be added for these.
* Are there special
considerations for collaborative applications?
* What else?
The kitchen sink? :)
How will the execution model affect our
implementation of data structures?
* how do you decompose a
data structure such that it is amenable to streaming in small chunks?
This is a major issue and relates to things
like out-of-core/etc.
I definitely feel that "chunking"
like mechanisms need to be in
the core interfaces.
* how do you represent
temporal dependencies in that model?
I need to give this more thought, there are
a lot of options.
* how do you minimize recomputation in order to
regenerate data for view-dependent algorithms.
Framework invisible caching. Not a major Framework issue.
What are the execution semantics necessary
to implement these execution models?
* how does a component
know when to compute new data? (what is the firing rule)
Explicit function calls with potential async
operation. A higher-level
wrapper can make this look like
"dataflow".
* does coordination of
the component execution require a central executive or can it be implemented
using only rules that are local to a particular component.
I think the central executive can be an
optional component (again, see
VisIt).
* how elegantly can
execution models be supported by the proposed execution semantics? Are there some things, like loops or
back-propagation of information that are difficult to implement using a
particular execution semantics?
There will always be warts...
How will security considerations affect
the execution model?
Security issues tend to impact two areas: 1)
effective bandwidth/latency
and 2) dynamic connection problems. 1) can be unavoidable, but will not
show up in most environments if we design
properly. 2) is a real problem
with few silver bullets.
3) Parallelism and
load-balancing=================
Thus far, managing parallelism in
visualization systems has been a tedious and difficult at best. Part of this is a lack of powerful
abstractions for managing data-parallelism, load-balancing and component control.
Please describe the kinds of parallel
execution models that must be supported by a visualization component
architecture.
* data-parallel/dataflow
pipelines?
* master/slave
work-queues?
I tend to use small dataflow pipelines
locally and higher-level
async streaming work-queue models globally.
* streaming update for
management of pipeline parallelism?
Yes, we use this, but it often requires a
global parallel filesystem to
be most effective.
* chunking mechanisms
where the number of chunks may be different from the number of CPU's employed
to process those chunks?
We use spacefilling curves to reduce the
overall expense of this
(common) operation (consider the compute/viz
impedance mismatch
problem as well). As a side effect, the codes gain cache coherency
as well.
* how should one manage
parallelism for interactive scripting languages that have a single thread of
control? (eg. I'm using a commandline
language like IDL that interactively drives an arbitrarily large set of
parallel resources. How can I make
the parallel back-end available to a single-threaded interactive thread of
control?)
Consider them as "scripting
languages", and have most operations
run through an executive (note the executive
would not be aware
of all component operations/interactions, it
is a higher-level
executive). Leave RPC style hooks for specific references.
Please describe your vision of what kinds
of software support / programming design patterns are needed to better support
parallelism and load balancing.
* What programming model
should be employed to express parallelism.
(UPC, MPI, SMP/OpenMP, custom sockets?)
The programming model must transcend
specific parallel APIs.
* Can you give some
examples of frameworks or design patterns that you consider very promising for
support of parallelism and load balancing.
(ie. PNNL Global Arrays or Sandia's
Zoltan)
http://www.cs.sandia.gov/Zoltan/
http://www.emsl.pnl.gov/docs/global/ga.html
no I cannot (am not up to speed).
* Should we use novel
software abstractions for expressing parallelism or should the implementation
of parallelism simply be an opaque property of the component? (ie. should there
be an abstract messaging layer or not)
I would vote no as it will allow known
paradigms to work, but will
interfere with research and new direction
integration. I think some
kind of basic message abstraction (outside
of the parallel data system)
is needed.
* How does the NxM work
fit in to all of this? Is it
sufficiently differentiated from Zoltan's capabilities?
Unable to comment...
===============End of Mandatory Section
(the rest is voluntary)=============
4) Graphics and Rendering=================
What do you use for converting geometry
and data into images (the rendering-engine). Please comment on any/all of the following.
* Should we build
modules around declarative/streaming methods for rendering geometry like
OpenGL, Chromium and DirectX or should we move to higher-level representations
for graphics offered by scene graphs?
IMHO, the key is defining the boundary and
interoperability constraints.
If these can be documented, then the
question becomes moot, you can
use whatever works best for the job.
What are the pitfalls of building our
component architecture around scene graphs?
Data cloning, data locking and good support
for streaming, view dependent,
progressive systems.
* What about Postscript,
PDF and other scale-free output methods for publication quality graphics? Are pixmaps sufficient?
Gotta make nice graphs. Pixmaps will not suffice.
In a distributed environment, we need to
create a rendering subsystem that can flexibly switch between drawing to a
client application by sending images, sending geometry, or sending geometry
fragments (image-based rendering)?
How do we do that?
See the Chromium approach. This is actually more easily done than
one might think. Define an image "fragment" and augment the
rendering
pipeline to handle it (ref: PICA and
Chromium).
* Please describe some
rendering models that you would like to see supported (ie. view-dependent
update, progressive update) and how they would adjust dynamically do changing
objective functions (optimize for fastest framerate, or fastest update on
geometry change, or varying workloads and resource constraints).
See the TeraScale browser system.
* Are there any good
examples of such a system?
None that are ideal :), but they are not
difficult to build.
What is the role of non-polygonal methods
for rendering (ie. shaders)?
* Are you using any of
the latest gaming features of commodity cards in your visualization systems
today?
* Do you see this
changing in the future? (how?)
This is a big problem area. Shaders are difficult to
combine/pipeline.
We are using this stuff now and I do not see
it getting much easier
(hlsl does not fix it). At some point, I believe that
non-polygon
methods will become more common that polygon
methods (about 3-4 years?).
Poylgons are a major bottleneck on current
gfx cards as they limit
parallelism. I'm not sure what the fix will be but it will still be
called OpenGL :).
5) Presentation=========================
It will be necessary to separate the
visualization back-end from the presentation interface. For instance, you may want to have the
same back-end driven by entirely different control-panels/GUIs and displayed in
different display devices (a CAVE vs. a desktop machine). Such separation is also useful
when you want to provide different implementations of the user-interface
depending on the targeted user community.
For instance, visualization experts might desire a dataflow-like
interface for composing visualization workflows whereas a scientists might
desire a domain-specific dash-board like interface that implements a specific
workflow. Both users should be
able to share the same back-end components and implementation even though the
user interface differs considerably.
How do different presentation devices affect
the component model?
* Do different display
devices require completely different user interface paradigms? If so, then we must define a clear
separation between the GUI description and the components performing the
back-end computations. If not,
then is there a common language to describe user interfaces that can be used
across platforms?
I think they do (e.g. immersion).
* Do different display
modalities require completely different component/algorithm implementations for
the back-end compute engine?
(what do we do about that??)
They can (e.g. holography), but I do not see
a big problem there.
Push the representation through an
abstraction (not a layer).
What Presentation modalities do you feel
are important and what do you consider the most important.
* Desktop graphics
(native applications on Windows, on Macs)
#1 (by a fair margin)
* Graphics access via
Virtual Machines like Java?
#5
* CAVEs, Immersadesks,
and other VR devices
#4
* Ultra-high-res/Tiled
display devices?
#3 - note that tiling applies to desktop
systems as well, not
necessarily high-pixel count displays.
* Web-based
applications?
#2
What abstractions do you think should be
employed to separate the presentation interface from the back-end compute
engine?
* Should we be using CCA
to define the communication between GUI and compute engine or should we be
using software infrastructure that was designed specifically for that space?
(ie. WSDL, OGSA, or CORBA?)
No strong opinion.
* How do such control
interfaces work with parallel applications?
Should the parallel application have a
single process that manages the control interface and broadcasts to all nodes
or should the control interface treat all application processes within a given
component as peers?
Consider DMX, by default, single
w/broadcast, but it supports
backend bypass...
6) Basic Deployment/Development
Environment Issues============
One of the goals of the distributed
visualization architecture is seamless operation on the Grid --
distributed/heterogeneous collections of machines. However, it is quite difficult to realize such a vision
without some consideration of deployment/portability issues. This question also touches on issues
related to the development environment and what kinds of development methods
should be supported.
What languages do you use for core vis
algorithms and frameworks.
* for the numerically
intensive parts of vis algorithms
C/C++ (a tiny amount of Fortran)
* for the glue that
connects your vis algorithms together into an application?
C/C++
* How aggressively do
you use language-specific features like C++ templates?
Not very, but they are used.
* is Fortran important
to you? Is it important that a
framework support it seamlessly?
Pretty important, but at least
"standardly" enhanced F77 should be simple :).
* Do you see other
languages becoming important for visualization (ie. Python, UPC, or even
BASIC?)
Python is big for us.
What platforms are used for data
analysis/visualization?
* What do you and your
target users depend on to display results? (ie. Windows, Linux, SGI, Sun etc..)
Linux, SGI, Sun, Windows, MacOS in that
order
* What kinds of presentation
devices are employed (desktops, portables, handhelds, CAVEs, Access Grids,
WebPages/Collaboratories) and what is their relative importance to active
users.
Remote desktops and laptops. Very important
* What is the relative
importants of these various presentation methods from a research standpoint?
PowerPoint :)?
* Do you see other
up-and-coming visualization platforms in the future?
Tablets & set-top boxes.
Tell us how you deal with the issue of
versioning and library dependencies for software deployment.
* For source code
distributions, do you bundle builds of all related libraries with each software
release (ie. bundle HDF5 and FLTK source with each release).
For many libs, yes.
* What methods are
employed to support platform independent builds (cmake, imake, autoconf). What are the benefits and problems with
this approach.
gmake based makefiles.
* For binaries, have you
have issues with different versions of libraries (ie. GLIBC problems on Linux
and different JVM implemetnations/version for Java). Can you tell us about any sophisticated packaging methods
that address some of these problems (RPM need not apply)
No real problems other that GLIBC
problems. We do tend to ship
static
for several libs. Motif used to be a problem on Linux (LessTiff vs
OpenMotif).
* How do you handle
multiplatform builds?
cron jobs on multiple platforms, directly
from CVS repos. Entire
environment can be built from CVS repo info
(or cached).
How do you (or would you) provide
abstractions that hide the locality of various components of your
visualization/data analysis application?
* Does anyone have ample
experience with CORBA, OGSA, DCOM, .NET, RPC? Please comment on advantages/problems of these technologies.
* Do web/grid services
come into play here?
Not usually an issue for us.
7) Collaboration
==========================
If you are interested in
"collaborative appllications" please define the term
"collaborative". Perhaps
provide examples of collaborative application paradigms.
Meeting Maker? :) :) (I'm getting tired).
Is collaboration a feature that exists at
an application level or are there key requirements for collaborative
applications that necessitate component-level support?
* Should collaborative
infrastructure be incorporated as a core feature of very component?
* Can any conceivable
collaborative requirement be satisfied using a separate set of modules that
specifically manage distribution of events and data in collaborative
applications?
* How is the
collaborative application presented?
Does the application only need to be collaborative sometimes?
* Where does performance
come in to play? Does the
visualization system or underlying libraries need to be performance-aware? (i.e. I'm doing a given task and I need
a framerate of X for it to be useful using my current compute resources),
network aware (i.e. the system is starving for data and must respond by adding
an alternate stream or redeploying the pipeline). Are these considerations implemented at the component level,
framework level, or are they entirely out-of-scope for our consideration?
And I'm at the end... Whew.
rjf.
Randall Frank
| Email: rjfrank@llnl.gov
Lawrence Livermore National Laboratory | Office: B451 R2039
7000 East Avenue, Mailstop:L-561 |
Voice: (925) 423-9399
Livermore, CA 94550
| Fax: (925)
423-8704