Preface Comment from Ilmi Yoon:

Just one curiosity, component is in much larger granularity compared

to object in terms of reuasability or usage itself.   Component is kind of

package of objects that has interfaces to communicate with other components.

So, components are much more portable and easily re-usably without knowing

the programming environment of diffenrent components - they can be different

programming langauges, etc as long as they know the interfaces to each

other... Just I feel some discussions are related to object-oriented not,

component-oriented. Maybeit is from my ignorance and/or lacking of certain

backgrounds from the last meeting.

 

 

=============The Survey=========================

 

1) Data Structures/Representations/Management==================

The center of every successful modular visualization architecture has been a flexible core set of data structures for representing data that is important to the targeted application domain.  Before we can begin working on algorithms, we must come to some agreement on common methods (either data structures or accessors/method  calls) for exchanging data between components of our vis framework.

 

There are two potentially disparate motivations for defining the data representation requirements.  In the coarse-grained case, we need to define standards for exchanging data between components in this framework (interoperability).  In the fined-grained case, we want to define some canonical data structures that can be used within a component -- one developed specifically for this framework.  These two use-cases may drive different set of requirements and implementation issues.

        * Do you feel both of these use cases are equally important or should we focus exclusively on one or the other?

 

Randy: I think that interoperability (both in terms of data and perhaps more

critically operation/interaction) is more critical than fine-grained

data sharing.  My motivation: there is no way that DiVA will be able to

meet all needs initially and in many cases, it may be fine for data to

go "opaque" to the framework once inside a "limb" in the framework (e.g.

VTK could be a limb).  This allows the framework to be easily populated

with a lot of solid code bases and shifts the initial focus on important

interactions (perhaps domain centric).  Over time, I see the fine-grain

stuff coming up, but perhaps proposed by the "limbs" rather than the

framework.  I do feel that the coarse level must take into account

distributed processing however...

 

I want to facilitate interfaces

between packages, opting for (possibly specific) data models that map

to the application at hand.  I could use some generic mechanisms

provided by DiVA to reduce the amount of code I need or bootstrap

more rapid prototyping, but it is not key that the data model be

burned fully into the Framework.  I certainly feel that the Framework

should be able to support more than one data model (since we have

repeatedly illustrated that all "realizable" models have design

boundaries that we will eventually hit.

 

Pat: I think both cases are important, but agreeing upon the fine-grained access

will be harder.

 

John C: Too soon to tell. Focus on both until the issues become more clear.

 

Jim: I think for now we need to exclusively focus on exchanging data between

components, rather than any fine-grained generalized data objects...

 

The first order entry into any component development is to "wrap up

what ya got".  The "rip things apart" phase comes after you can glue

all the coarse-grained piece together reliably...

 

Ilmi: I think we need to decide the coarse-grain something like SOAP that wraps

the internal data with XML format. But I think we don't need to decide the

fined-grain since each component can have choose their own way/format and

then post format to public, so the party who want to use the component needs

to follow the interface. But if we like to decide initial sets of format

that must/may be supported by diva components, then we can list most popular

format and choose some/all of them.

 

JohnS: While I am very interested in design patterns, data structures, and services that could make the design of the interior of parallel/distributed components easier, it is clear that the interfaces between components are the central focus of this project.  So the definition of inter-component data exchanges is preeminent.

 

Wes: Both are important. The strongest case, IMO, for the intra-component DS/DM

is that I have a stable set of data modeling/mgt tools that I can use for

families of components. Having a solid DS/DM base will free me to focus

on vis and rendering algorithms, which is how I want to spend my time.

 

The strongest case for the inter-component DS/DM is the "strong typing"

property that makes AVS and apps of its ilk work so well.

 

The "elephant in the living room" is that there is no silver bullet.

I favor an approach that is, by design, incremental. What I mean is that

we can deal with structure grids, unstructured grids, geom and other

renderable data, etc. in a more or less piecemeal fashion with an eye

towards component level interoperability in the long term. In the beginning,

there won't be 100% interoperability as if, for example, all data models

and types were stuffed into a vector bundles interface. OTOH, a more

conciliatory approach will permit forward progress among multiple

independent groups who are all eyeing "interoperability". This is the

real goal, not a "single true data model."

 

á       Do you feel the requirements for each of these use-cases are aligned or will they involve two separate development tracks?  For instance, using "accessors" (method calls that provide abstract access to essentially opaque data structures) will likely work fine for the coarse-grained data exchanges between components, but will lead to inefficiencies if used to implement algorithms within a particular component.

á        As you answer the "implementation and requirements" questions below, please try to identify where coarse-grained and fine-grained use cases will affect the implementation requirements.

 

Randy: I think you hit the nail on the head.  Where necessary, I see sub-portions

of the framework working out the necessary fine-grained, efficient,

"aware" interactions and datastuctures as needed.  I strongly doubt we

would get that part right initially and think it would lead to some of

the same constraints that are forcing us to re-invent frameworks right

now.  IMHO: the fine-grain stuff must be flexible and dynamic over

time as development and research progress.

 

Pat: I think the focus should be on interfaces rather than data structures.  I

would advocate this approach not just because it's the standard

"object-oriented" way, but because it's the one we followed with FEL,

and now FM, and it has been a big win for us.  It's a significant benefit

not having to maintain different versions of the same visualization

technique, each dedicated to a different method for producing the

data (i.e., different data structures).  So, for example, we use the same

visualization code in both in-core and out-of-core cases.  Assuming up

front that  an interface-based approach would be too slow is, in my

humble opinion, classic premature optimization.

 

Jim: Two separate development tracks.  Definitely.  There are different driving

design forces and they can be developed (somewhat) independently (I hope).

 

Lori: The TSTT center is not interested in defining a data representation

per se - that is dictating what the data structure will look like.  Rather,

we are interested in defining how data can be accessed in a uniform

way from a wide variety of different data structures (for both structured

and unstructured meshes).  This came about because we recognize

that

  1.  there are a lot of different meshing/data frameworks out there,

       that have many man years of effort behind their development,

       that are not going to change their data structures very easily

       (if at all).  Moreover, these infrastructures have made their

       choices for a reason - if there was a one-size-fits-all answer,

       someone probably would have found it by now :-)

  2.  Because of the difference in data structures - it has been very

       difficult for application scientists (and tool builders) to experiment

       with and/or support different data infrastructures which has

       severely limited their ability to play with different meshing strategies,

       discretization schemes, etc.

 

We are trying to address this latter point - by developing common

interfaces for a variety of infrastructures applications can easily

experiment with different techniques and supporting tool developers

(such as mesh quality improvement and front tracking codes) and

write their tools to a single API and automatically support multiple

infrastructures.

 

We are also experimenting with the language interoperability tools

provided by the Babel team at LLNL and have ongoing work to

evaluate it's performance (and the performance of our interface in

general) for fine and course grained access to mesh (data) entities -

something that I suspect will be of interest to this group as well.

 

JohnC: I think it's premature to say. We need to have agreement on the

questions below first.

 

Ilmi: There will be some overhead and inefficiency using accessors for data

exchange, but I like the apporach of accessors and believe the CCA achieves

the reusability in expense of performance as OOP does anyway. Just we try to

make the expense as little as possible.

 

JohnS: Given the focus on inter-component data exchange, I think accessors provide the most straightforward paradigm for data exchange.  The arguments to the data access methods can involve elemental data types rather than composite data structures (eg. we use scalars and arrays of basic machine data types rather than hierarchical structures).  Therefore we should look closely at FM's API organization as well as the accessors employed by SCIRun V1 (before they employed dynamic compilation).

 

The accessor method works well for abstracting component location, but requires potentially redundant copying of data for components in the same memory space.  It may be necessary to use reference counting in order to reduce the need to recopy data arrays between co-located components, but I'd really like to avoid making ref counting a mandatory requirement if we can avoid it.  (does anyone know how to avoid redundant data copying between opaque components without employing reference counting?)

 

Wes: They are aligned to a large degree - data structures/models are produced and

consumed by component code, but may also be manipulated (serialized,

marshalled, etc) by the framework.

 

What are requirements for the data representations that must be supported by a common infrastructure.  We will start by answering Pat's questions of about representation requirements and follow up with personal experiences involving particular domain scientist's requirements.

        Must: support for structured data

 

Randy: Must-at the coarse level, I think this could form the basis of all

other representations.

 

Pat: Structured data support is a must.

 

JohnC: Must

 

Jim: Must

 

JohnS: Must

 

Wes: Agree.

 

        Must/Want: support for multi-block data?

 

Randy: Must-at the coarse level, I think this is key for scalability,

domain decomposition and streaming/multpart data transfer.

 

Pat: We have unstructured data, mostly based on tetrahedral or prismatic meshes.

We need support for at least those types.  I do not think we could simply

graft unstructured data support on top of our structured data structures.

 

JohnC: Must

 

Jim: Must

 

JohnS: Must

 

Wes: Must. We must set targets that meet our needs, and not sacrifice

requirements for speed of implementation.

 

        Must/Want: support for various unstructured data representations? (which ones?)

 

Randy: Nice-but I would be willing to live with an implementation on top

of structured, multi-block (e.g. Exdous).  I feel accessors are

fine for this at the "framework" level (not at the leaves).

 

Pat: We have unstructured data, mostly based on tetrahedral or prismatic meshes.

We need support for at least those types.  I do not think we could simply

graft unstructured data support on top of our structured data structures.

 

JohnC: Not sure.  Not a priority.

 

Jim: Want (low priority)

 

JohnS: Cell based initially unstructured representations first.  Need support for arbitrary connectivity eventually, but not mandatory.  I liked Iris ExplorerÕs hierarchical model as it seems more general than the model offered by other vis systems.

 

Wes: Must. Unstructured data reps are widely used and they should not be

excluded from the base set of DS/DM technologies.

 

        Must/Want: support for adaptive grid standards?  Please be specific about which adaptive grid methods you are referring to.  Restricted block-structured AMR (aligned grids), general block-structured AMR (rotated grids), hierarchical unstructured AMR, or non-hierarchical adaptive structured/unstructured meshes.

 

Randy: Similar to my comments on unstructured data reps.  In the long

run, something like boxlib with support for both P and H adaptivity

will be needed (IMHO, VTK might provide this).

 

Pat: Adaptive grid support is a "want" for us currently, probably eventually

a "must".  The local favorite is CART3D, which consists of hierarchical

regular grids.  The messy part is that CART3D also supports having

more-or-less arbitrary shapes in the domain, e.g., an aircraft fuselage.

Handling the shape description and all the "cut cell" intersections

I expect will be a pain.

 

JohnC: Adaptive grid usage is in its infancy at NCAR. But I suspect it is the

way of the future. Too soon to be specific about which adaptive grid

methods are prefered.

 

Jim: Want (low priority) the AMR folks havfe been trying to get together and define

a standard API, and have been as yet unsuccessful.  Who are we to attempt

this where they have failed...?

 

JohnS: If we can define the data models rigorously for the individual grid types (ie. structured and unstructured data), then adaptive grid standards really revolve around an infrastructure for indexing data items.  We normally think of indexing datasets by time and by data species.  However, we need to have more general indexing methods that can be used to support concepts of spatial and temporal relationships.  Support for pervasive indexing structures is also important for supporting other visualization features like K-d trees, octrees, and other such methods that are used to accelerate graphics algorithms.  We really should consider how to pass such representations down the data analysis pipeline in a uniform manner because they are used so commonly.

 

Wes: Want, badly. We could start with Berger-Colella AMR since it is widely

used. I'm not crazy about Boxlib, though, and hope we can do something

that is easier to use.

 

        Must/Want: "vertex-centered" data, "cell-centered" data? other-centered?

 

Randy: Must.

 

Pat: Most of the data we see is still vertex-centered.  FM supports other

associations, but we haven't used them much so far.

 

Jim: Want (low priority)

All of these should be "Wants", to the extent that they require more

sophisticated handling, or are less well-known in terms of generalizing

the interfaces.

 

For example, the AMR folks havfe been trying to get together and define

a standard API, and have been as yet unsuccessful.  Who are we to attempt

this where they have failed...?

 

So to clarify, if we *really* understand (or think we do) a particular

data representation/organization, or even a specific subset of a general

representation type, then by all means lets whittle an API into our stuff.

Otherwise, leave it alone for someone else to do, or do as strictly needed.

 

JohnS: The accessors must  understand (or not preclude) all centering.  This is particularly for structured grids where vis systems are typically lax in storing/representing this information.

 

Wes: Don't care - will let someone else answer this.

 

Note: It sounds like at least time-varying data handling is well understood by the people who want it.

 

        Must: support time-varying data, sequenced, streamed data?

 

Randy: Must, but way too much to say here to do it justice.  I will say

that the core must deal with time-varying/sequenced data.  Streaming

might be able to be placed on top of that, if it is designed

properly.  I will add that we have a need for progressive data as

well.

 

Pat: Support for time-varying data is a must.

 

JohnC: Must. Time varying data is what makes so many of our problems currently

intractible. Too many of the available tools (e.g. VTK) assume static

data and completely fall apart when the data is otherwise.

 

Definitely there is

not support for any routines that require temporal integration (e.g.

unsteady flow viz). In general, there is no notion of a timestep in

VTk. Datasets are 3D. Period.

 

Additionally, there is a performance issue: VTK is not optimized in any

way at moving data through the pipeline at high rates. It's underlying

archictecture seems to assume that as long as the pipeline eventually

generates some geometry, it's ok if it take a loooong time because

you're going to interact with that geometry (navigating through camera

space, color space, etc.) and the "pre-processing" doesn't have to run

at interactive rates. So the data readers are pathetically slow (and

there is little hope for optimization here with that data model that is

used). There is no way to exploit temporal coherence in any of the data

operators. No simple way to cache results if you want to play out of

memory.

 

At a high level you need a design that gives consideration to temporal needs

throughout the architecture.  I think the data structures do need to be

time varying data aware, not just capable of dealing with 4D data

(although I can't think of a specific example of why now). One issue is

that the temporal dimension often has different spacing/regularity than

the spatial dimension.  Obviously you're talking different units from

the spatial dimensions as well. There are also system-level issues as

well (e.g. unsteady flow viz needs, exploiting temporal coherence,

caching, support for exploring the temporal dimension from user

interactors).

 

I know i've only just started to scratch the surface here. We could

probably devote an entire workshop to time varying data needs and

several more figuring out how to actually suppor them.

 

Jim: MUST

 

JohnS: Yes to all.  However, the concept of streamed data must be defined in more detail.  This is where the execution paradigm is going to affect the data structures.

 

Wes: Not ready for prime time. I've read two or three research proposals in

the past year that focus on methods for time-varying data representations

and manipulation. IMO, this topic is not ready for prime time yet. We

can say that it would be nice to have, but will probably not be fully

prepared to start whacking out code.

 

Note: Should do quick gap-analysis on what existing tools fulfill this requirement.

 

        Must/Want: higher-order elements?

 

Randy: Must - but again, this can often be "faked" on top of other reps.

 

Pat: Occasionally people ask about it, but we haven't found it to be a "must".

 

JohnC: low priority

 

Jim: Wants, see above...

 

JohnS: Not yet.

 

Wes: Not sure what this means, exactly, so I'll improvise. Beyond scientific

data representations, there is a family of "vis data structures" that need

to be on the table. These include renderable stuff - images, deep images,

explicit and implicit geometry, scene graph ordering semantics, scene

specification semantics, etc. In addition, there is the issue of

"performance data" and how it will be represented.

 

Note: I find the response to this quite funny because I ran a two day workshop about 3 years ago here at LBNL on finite element analysis requirements.  We got bashed for two days straight by the FEM code jocks because we didnÕt seem to care about higher order elements.  So it would be interesting to know if we donÕt see much of this because its not needed or if the domain scientists simply lost all confidence in us to deal with this issue properly.

 

        Must/Want: Expression of material interface boundaries and other special-treatment of boundary conditions.

 

Randy: Must, but I will break this into two cases.  Material interfaces for

us are essentially sparse vectors sets, so they can be handled with

basic mechanisms so I do not see that as core, other than perhaps

support for compression.  Boundary conditions (e.g. ghost zoning,

AMR boundaries, etc) are critical.

 

Pat: We don't see this so much.  "Want", but not must.

 

JohnC: no priority

 

Jim: Want, see aboveÉ

 

JohnS: Yes, we must treat ghost zones specially or parallel vis algorithms will create significant artifacts.  I'm not sure what is required for combined air-ocean models.

 

Wes: I'll let someone else answer this one.

 

Note: DOE vis workshop, it was pointed out that simple things like isosurface give inconsistent (or radically different) results on material interface boundaries depending on assumptions about the boundary treatment. YouÕd think that this would come up with analysis combined air-ocean models, but apparently not among the vis people.  From a data analysis standpoint, domain scientists say this is incredibly important, but they canÕt deal with it because none of the vis or data analysis people listen to them.

 

        * For commonly understood datatypes like structured and unstructured, please focus on any features that are commonly overlooked in typical implementations.  For example, often data-centering is overlooked in structured data representations in vis systems and FEM researchers commonly criticize vis people for co-mingling geometry with topology for unstructured grid representations.  Few datastructures provide proper treatment of boundary conditions or material interfaces.  Please describe your personal experience on these matters.

 

Randy: Make sure you get the lowest common denominator correct!  There is

no realistic way that the framework can support everything, everywhere

without losing its ability to be nimble (no matter what some OOP folks

say).  Simplicity ore representation with "externally" supplied optional

optimization information is one approach to this kind of problem.

 

Pat: One thing left out of the items above is support for some sort of "blanking"

mechanism, i.e., a means to indicate that the data at some nodes are not

valid.  That's a must for us.  For instance, with Earth science data we see

the use of some special value to indicate "no data" locations.

 

JohnC: Support for missing data is essential for observed fields.

To do it right you need some way to flag

data cells/vertices within the data model as not containing valid data.

Then you need to add support to your data "operators" as well. For example,

if your operator is some kind of reconstruction filter it needs to know

to use a different kernel when missing data are involved.

 

Obviously, this could pose a signficant amount of overhead on the entire

system, and the effort may not be justified if the DOE doesn't have

great need for dealing with instrument acquired data. I only added the

point as a discussion topic as it is fairly important to us. At the

very least, I would hope to have the flexibility to hack support

for missing data if it was not integral to the core framework.

 

Jim: I don't think we should "pee in this pool" either yet.  Are any of us

experts in this kind of viz?  Let's stick with what we collectively know

best and make that work before we try to tackle a related-but-fundamentally-

different-domain.

 

JohnS: There is little support for non-cartesian coordinate systems in typical data structures.  We will need to have a discussion of how to support coordinate projections/conversions in a comprehensive manner.  This will be very important for applications relating to the National Virtual Observatory.

 

Wes: No comment

 

        * Please describe data representation requirements for novel data representations such as bioinformatics and terrestrial sensor datasets.  In particular, how should we handle more abstract data that is typically given the moniker "information visualization".

 

Randy: Obviously, do not forget "records" and aggregate/derived types.  That

having been said, the overheads for these can be ugly.  Consider

parallel arrays as an alternative...

 

Pat: "Field Model" draws the line only trying to represent fields and the meshes

that the fields are based on.  I not really familiar enough with other types

of data to know what interfaces/data-structures would be best.  We haven't

see a lot of demand for those types of data as of yet.  A low-priority "want".

 

JohnC: Beats me.

 

JohnS: I simply don't know enough about this field to comment.

 

Wes: Maybe I don't understand the problem...the same tough issues that plague

more familiar data models appear to be present in bioinformatics and

"info viz" data mgt. There are heirarchical data, unstructured data,

multivariate and multidimensional data, etc.

 

Note:  Must separate mesh from field data interfaces.

              The mesh may not be updated as often as the field.

              Perhaps time-range of validity is important information.

 

What do you consider the most elegant/comprehensive implementation for data representations that you believe could form the basis for a comprehensive visualization framework?

á       For instance, AVS uses entirely different datastructures for structure, unstructured and geometry data.  VTK uses class inheritance to express the similarities between related structures.  Ensight treats unstructured data and geometry nearly interchangably.  OpenDX uses more vector-bundle-like constructs to provide a more unified view of disparate data structures.  FM uses data-accessors (essentially keeping the data structures opaque).

 

 

Randy: IMHO: layered data structuring combined with data accessors is

probably the right way to go.  Keep the basic representational

elements simple.

 

Pat: Well, as you'd expect, as the primary author of Field Model (FM) I think it's

the most elegant/comprehensive of the lot.  It handles structured and

unstructured data.  It handles data non-vertex-centered data.  I think it

should be able to handle adaptive data, though it hasn't actually been

put to the test yet.  And of course every adaptive mesh scheme is a little

different.  I think it could handle boundary condition needs, though that's

not something we see much of.

 

JohnC: I don't think this is what you're after, but i've come to believe that

multiresolution data representations with efficient domain subsetting

capabilities are the most pragmatic and elegant

way to deal with large data sets. In addition to enabling interaction

with the largest data sets they offer tremenous scalability from desktop

to "visual supercomputer". i would encourage a data model that includes

and facilitates their integral support.

 

Ilmi: Combination of (externally) FM data-accessors and (internally) VTK class

inheritance.

 

JohnS: Since I'm already on record as saying that opaque data accessors are essential for this project, it is clear that FM offers the most compelling implementation that satisfies this requirement.

 

        * Are there any of the requirements above that are not covered by the structure you propose?

 

Randy: I think one big issue will be the distributed representations.

              This item is ill handled by many of these systems.

 

Pat: Out-of-core?  Derived fields? Analytic meshes (e.g., regular meshes)?

              Differential operators?  Interpolation methods?

 

JohnC: Not sure.

 

JohnS: We need to be able to express a wider variety of data layout conversions and have some design pattern that reduces the need to recopy data arrays for local components.  The FM model also needs to have additional API support for hierarchical indices to accelerate access to subsections of arrays or domains.

 

Wes: Not sure how to answer. The one thing that came to mind is a general observation

that the above data models are designed for scientific data. The AVS geom

data structure was opaque to the developer, and if you looked at the header

files, was really, really ugly. Since I have a keen interest in renderers,

I am very concerned about having adequate flexibility and performance from

a DS/DM for moving/representing renderable data, as opposed to large

structured or unstructured meshes. It is possible to generalize a

DS for storing renderable data (e.g, a scene graph), but this separate class

of citizen reflects the partitioning of data types in AVS. Perhaps this

isn't something to be concerned about at this point.

 

Note: Area of unique features?

              -blanking arrays

              -data handling for distributed data

              -better handling of time-varying data

              -hints for caching so that temporal locality can be exploited

              -indexing (not in TSTT senseÉ needs more discussion.  Need support for kD trees and rapid lookup.  Indexing might help with our AMR issues)

 

        * This should focus on the elegance/usefulness of the core design-pattern employed by the implementation rather than a point-by-point description of the implemenation!

 

Randy: Is it possible to consider a COM Automation Object-like approach,

also similar to the CCA breakdown.  Basically, define the common

stuff and make it interchangable then build on top.  Allow underlying

objects to be "aware" and wink to each other to bypass as needed.

In the long run, consider standardizing on working bypass paradigms

and bring them into the code (e.g. OpenGL).

 

Note: We need the ÒbypassÓ.  The question is how do we supply the bypass  mechanism for unanticipated data?

 

Pat: I think if we could reasonably cover the (preliminary) requirments above,

that would be a good first step.  I agree with Randy that whatever we

come up with will have to be able to "adapt" over time as our understanding

moves forward.

 

        * Is there information or characteristics of particular file format standards that must percolate up into the specific implementation of the in-memory data structures?

 

Randy: Not really, but metadata handling and referencing will be key and need

to be general.

 

Pat: In FM we tried hard to file-format-specific stuff out of the core model.

Instead, there are additional modules built on top of FM that handle

the file-format-specific stuff, like I/O and derived fields specific to

a particular format.  Currently we have PLOT3D, FITS, and HDFEOS4

modules that are pretty well filled out, and other modules that are

mostly skeletons at this point.

 

We should also be careful not to assume that analyzing the data starts

with "read the data from a file into memory, ...".  Don't forget out-of-core,

analysis concurrent with simulation, among others.

 

One area where the file-format-specific issues creep in is with metadata.

Most file formats have some sort of metadata storage support, some much

more elaborate than others.  Applications need to get at this metadata,

possibly through the data model, possibly some other way.  I don't have

the answer here, but it's something to keep in mind.

 

Jim: I dunno, but what does HDF5 or NetCDF include?  We should definitely be

able to handle various meta-data...

 

Otherwise, our viz framework should be able to read in all sorts of

file-based data as input, converting it seamlessly into our "Holy Data

Grail" format for all the components to use and pass around.  But the

data shouldn't be identifiable as having once been HDF or NetCDF, etc...

(i.e. it's important to read the data format, but not to use it internally)

 

JohnS: I hope not.

 

Wes: One observation is what seems to be a successful design pattern from the

DMF effort: let the HDF guys build the heavy lifting machinery, and focus

upon an abstraction layer that uses the machinery to move bytes.

 

Note:  Metadata:  Must also be propagated down the pipeline.

              Ignored by items that donÕt care, but recognized by pipeline components that do.

              -alternative is database at the reader, but seems to create painful connection mechanics.

              -and still have to figure out how to reference the proper component, even after going through data-structure transformations.

 

One powerful feature of both HDF and XML is the ability to ignore and pass-through unrecognized constructs/metadata.

 

For the purpose of this survey, "data analysis" is defined broadly as all non-visual data processing done *after* the simulation code has finished and *before* "visual analysis".

        * Is there a clear dividing line between "data analysis" and "visual analysis" requirements?

 

Randy: Not in my opinion.

 

Pat: Your definition excludes concurrent analysis and steering from

"visualization".  Is this intentional?  I don't think there's a clear dividing

line here.

 

JohnC: I take issue with your definition of data analysis. Yes it is performed

after the simulation, but it is performed (or would be performed if viz

tools didn't suck) in *parallel* with visual analysis.  The two when

well integrated, which is rarely the case, can compliment each other

tremendously. So called "visual analysis" by itself, without good

quantitative capablity, is pretty useless.

 

Well, text based, programmable user interfaces are a must for "data

analysis" , whereas GUI is essential for visual.

 

Jim: NO.  There shouldn't be - these operations are tightly coupled, or even

symbiotic, and *should* all be incorporated into the same framework,

indistinguishable from each other.

 

Ilmi: Some components do purely data analysis, some do only visual, but there

will be calls to the data analysis component from the visual during the

analysis.

 

JohnS: There shouldn't be.  However, people at the SRM community left me with the impression that they felt data analysis had been essentially abandoned by the vis community in favor or "visual analysis" methods.  We need to undo this.

 

Wes: Generally speaking, there's not much difference.

 

That said, some differences seem obvious to me:

 

1. Performance - visualization is most often an interactive process, but

has offline implementations. "Plain old" data analysis seems to be mostly

an offline activity with a few interactive implementations.

 

2. Scope - data analysis seems to be a subset of vis. Data analysis doesn't

have need for as rich a DS/DM infrastructure as vis.

 

Note:  Righteous indignation.  Well thatÕs good.  Why then do SDM people and domain scientists think we donÕt care?  They arenÕt smoking crack.  They have legitimate reasons to believe that we arenÕt being genuine when we say that we care about data analysis  functionality.  Can I do data analysis with Vis5D?  Does VTK offer me a wide array of statistical methods?  Must keep this central as we design this system.  Do you agree with John ClyneÕs assertion that data analysis == text interface and visualization==GUI.

 

        * Can we (should we) incorporate data analysis functionality into this framework, or is it just focused on visual analysis.

 

Randy: Yes and we should, particularly as the complexity and size of

data grows, we begin to rely more heavily on "data analysis" based

visualization.

 

Pat: I think you would also want to include feature detection techniques.  For

large data analysis in particular, we don't want to assume that the scientist

will want to do the analysis by visually scanning through all the data.

 

JohnC: If visualization is ever going to live up to the claim made by so many

in the viz community of

it being an indispensable tool for analsyis, tight integration with

statistical tools and data processing capabilities are a must. Otherwise

we'll just continue to make pretty pictures, put on dog and pony shows,

and wonder where the users are.

 

Jim: YES.

 

Ilmi: Not all data analysis, but there are lots of data analysis being used for

visual analysis and, the more tools are provided initially, it gets easier

to make user-group become big. So, we can list candidates.

 

JohnS: Vis is bullshit without seamless integration with flexible data analysis methods.  The most flexible methods available are text-based.  The failure to integrate more powerful data analysis features into contemporary 3D vis tools has been a serious problem.

 

Wes: Ideally, the same machinery could be used in both domains.

 

Notes: Data Analysis and  Feature detection support constitutes more unique features for this framework.

              How do you then index your bag of features or have them properly refer back to the

              Data that led to their generation?  Sometimes detected feature is discrete marker,

              Other times, it is treated as a derived field.  Former case seems to again point to

              Need for robust indexing method.

             

              JohnC observation about data-analysis==text-based s very interesting!!!  Does everyone agree?

 

              Aspect is one of few examples of providing a graphical workflow interface for traditionally procedural/text-based data analysis tools.

 

        * What kinds of data analysis typically needs to be done in your field?  Please give examples and how these functions are currently implemented.

 

Randy: Obviously basic statistics (e.g. moments, limits, etc).  Regression

and model driven analysis are common.  For example, comparison of

data/fields via comparison vs common distance maps.  Prediction of

activation "outliers" via general linear models applied on an

element by element basis, streaming through temporal data windows.

 

Pat: Around here there is interest in vector-field topology feature detection

techniques, for instance, vortex-core detection.

 

JohnC: Pretty much everything you can do with IDL or matlab.

 

Jim: Simple sampling, basic statistical averages/deviations, principal component

analysis (PCA, or EOF for climate folks), other dimension reduction.

Typically implemented as C/C++ code...  mostly slow serial...  :-Q

 

JohnS: This question is targeted at vis folks that have been focused on a particular scientific domain.  For general use, I think of IDL as being one of the most popular/powerful data analysis languages.  Python has become increasingly important -- especially with the Livermore numerical extensions and the PyGlobus software.  However, use of these scripting/data analysis languages have not made the transition to parallel/distributed-memory environments (except in a sort of data-parallel batch mode).

 

        * How do we incorporate powerful data analysis functionality into the framework?

 

Randy: Hard work :), include support for meta-data, consider support for

sparse data representations and include the necessary support for

"windowing" concepts.

 

Pat: Carefully :-)?  By striving not to make a closed system.

 

JohnC: I'd suggest exploring leveraging existing tools, numerical python for

example.

 

Jim: As components (duh)...  :-)

 

We should define some "standard" APIs for the desired analysis functions,

and then either wrap existing codes as components or shoehorn in existing

component implementations from systems like ASPECT.

 

JohnS: I'm very interested in work that Nagiza has proposed for a parallel implementation of the R statistics language.  The traditional approach for parallelizing scripting languages is to run them in a sort of MIMD mode of Nprocs identical scripts operating on different chunks of the same dataset.  This makes it difficult to have a commandline/interactive scripting environment.  I think Nagiza is proposing to have an interactive commandline environment that transparently manipulates distributed actions on the back-end.

 

There is a similar work in progress on parallel matlab at UC Berkeley.  Does anyone know of such an effort for Python?  (most of the parallel python hacks I know of are essentially MIMD which is not very useful).

 

 

2) Execution Model=======================

It will be necessary for us to agree on a common execution semantics for our components.  Otherwise, while we might have compatible data structures but incompatible execution requirements.  Execution semantics is akin to the function of protocol in the context of network serialization of data structures.  The motivating questions are as follows;

á       How is the execution model affected by the kinds of algorithms/system-behaviors we want to implement.

 

Pat: In general I see choices where at one end of the spectrum we have

simple analysis techniques where most of the control responsibilities

are handled from the outside.  At the other end we could have more

elaborate techniques that may handle load balancing, memory

management, thread management, and so on.  Techniques towards

the latter end of the spectrum will inevitably be intertwined more

with the execution model.

 

Jim: Directly.  There are probably a few main exec models we want to cover.

I don't think the list is *that* long...

 

As such, we should anticipate building several distinct framework

environments that each exclusively support a given exec model.  Then

the trick is to "glue" these individual frameworks together so they can

interoperate (exchange data and invoke each others' component methods)

and be arbitrarily "bridged" together to form complex higher-level

pipelines or other local/remote topologies.

 

Ilmi: I guess we can make each component propagate/fire the execution of next

component/components in the network/pipeline. Each component can use their

own memory or shared memory to access the data in process. In such case,

algorithm of each component does not get much affected by other coponents

around.

 

Wes: The "simple" execution model is for the framework to invoke a component, be

notified of its completion, then invoke the next component in the chain, etc.

Things get more interesting if you want to have a streaming processing model.

Related, progressive processing is somewhat akin to streaming, but more

stateful.

 

Note: WesÕ model sounds like a good ÒbaselineÓ model.  It does not allow for chains-of-invokation and therefore prevents us from getting locked-in to complex issues of comonent-local execution semantics and deadlock prevention.  Can we make this a baseline component requirement?

 

á       How then will a given execution model affect data structure implementations

 

Pat: Well, there's always thread-safety issues.

 

Jim: I don't think it should affect the data structure impls at all, per se.

Clearly, the access patterns will be different for various execution models,

but this shouldn't change the data impl.  Perhaps a better question is

how to indicate the expected access pattern to allow a given data impl

to optimize or properly prefetch/cache the accesses...

 

Note: Actually, that is the question.  How do we pass information about access patterns so that you can do the kind of temporal caching that John Clyne wants?  Its important to not do what VTK does, so that we donÕt have to un-do it again (as was the case for VisIt).

 

Ilmi: I guess we can make each component propagate/fire the execution of next

component/components in the network/pipeline. Each component can use their

own memory or shared memory to access the data in process. In such case,

algorithm of each component does not get much affected by other coponents

around.

 

JohnS: There will need to be some way to support both declarative execution semantics, data-driven and demand-driven semantics.  By declarative semantics, I mean support for environments that want to be in control of when the component "executes" or interactive scripting environments that wish to use the components much like subroutines.  This is separate from the demands of very interactive use-cases like view-dependent algorithms where the execution semantics must be more automatic (or at least hidden from the developer who is composing the components into an application).  I think this is potentially relevant to data model discussions because the automatic execution semantics often impose some additional requirements on the data structures to hand off tokens to one another.  There are also issues involved with managing concurrent access to data involved.  For instance, a demand-driven system demanded of progressive-update or view-dependent algorithms, will need to manage the interaction between the arrival of new data and asynchronous requests from the viewer to recompute existing data as the geometry is rotated.

 

(note: Wes provides a more succinct description of this execution semantics.)

 

Wes: We're back to the issue of needing a DS/DM that supports multiresolution

models from the git-go. The relationship between data analysis and vis data

models becomes more apparent here when we start thinking about multires

representations of unstructured data, like particle fields or point clouds.

 

Note: The fly in the ointment here is that highly interactive methods like multires models and view-dep algorithms are not well supported by completely simple/declarative semantics (unless you have an incredibly complex framework, but then the framework would require component-specific knowledge to schedule things properly).

 

á       How will the execution model be translated into execution semantics on the component level.  For example will we need to implement special control-ports on our components to implement particular execution models or will the semantics be implicit in the way we structure the method calls between components.

 

Pat: Not sure.

 

Jim: Components should be "dumb" and let other components or the framework invoke

them as needed for a given execution model.  The framework dictates the

control flow, not the component.  The API shouldn't change.

 

If you want multi-threaded components, then the framework better support

that, and the API for the component should take the possibility into account.

 

JohnS: I'm going to propose that we go after the declarative semantics first (no automatic execution of components) with hopes that you can wrap components that declare such an execution model with your own automatic execution semantics (whether it be a central executive or a distributed one).  This follows the paradigm that was employed for tools such as VisIt that wrapped each of the pieces of the VTK execution pipeline so that it could impose its own execution semantics on the pipeline rather than depending on the exec semantics that were predefined by VTK.  DiVA should follow this model, but start with the simplest possible execution model so that it doesn't need to be deconstructed if it fails to meet the application developer's needs (as was the case with VisIt).

 

We should have at least some discussion to ensure that the *baseline* declarative execution semantics imposes the fewest requirements for component development but can be wrapped in a very consistent/uniform/simple manner to support any of our planned pipeline execution scenarios.  This is an excercise in making things as simple as possible, but thinking ahead far enough about long-term goals to ensure that the baseline is "future proof" to some degree.

 

Wes: One thing that always SUPREMELY annoyed me about AVS was the absence of a

"stop" button on the modules. This issue concerns being able to interrupt

a module's processing when it was taking too long. Related, it might be nice

to have an execution model that uses the following paradigm: "OK, time's up,

give me what you have now."

 

Note: That is another way that the exec model is going to affect structures (or at least the API for accessing those structures).  We can refer to this as Òconcurrent access to dataÓ.  Do we have to incorporate locking semantics in the data accessors?  Do we have to incorporate firing protocol into the accessors (or at least hints as to firing constraints)?  Again, we donÕt want to get into a VisIt situation.

 

Automatic exec semantics do not give the framework enough control to address WesÕ issue.  Certainly this is an issue with VTK as well.  How do we formulate this as a requirement?

 

What kinds of execution models should be supported by the distributed visualization architecture

        * View dependent algorithms? (These were typically quite difficult to implement for dataflow visualization environments like AVS5).

 

Randy: I propose limited enforcement of fixed execution semantics. View/data/focus

dependent environments are common and need to be supported, however, they

are still tied very closely with data representations, hence will likely

need to be customized to application domains/functions.

 

Pat: Not used heavily here, but would be interesting.  A "want".

 

JohnC: These are neat research topics, but i've never been convinced that they

have much application beyond IEEEViz publications.  Mostly I believe

this because of the complexity they impose on the data model. Better to

simply offer progressive/multiresolution data access.

 

Jim: Want.

 

Ilmi: I like to say "must", but it is for improving usability and efficiency,

so people may live without it.

It will definitely improve the efficiency. If we want to support view

dependent algorithm, then we should consider it from the beginning of the

dataflow design, so it can be easily integrated into. View dependent or

image-based algorithm doesn't necessarily make much changes to existing data

flow design. View dependant or image-based algorithms are useful to

eliminate majority of data blocks from the rendering pipeline. Therefore, it

is good to provide capability to choose subset of data to be rendered from

the dataflow.

 

JohnS: Must be supported, but not as a basline exec model.

 

Wes: Yes

 

 

Note: Is this a framework or component issue?  Or is it a job for hierarchical components?

 

        * Out-of-core algorithms

 

Randy: This has to be a feature, given the focus on large data.

 

Pat: A "must" for us.

 

JohnC: Seems like a must for large data. But is this a requirement or a design

issue?

 

Jim: Must.  This is a necessary evil of "big data".  You need some killer

caching infrastructure throughout the pipeline (e.g. like VizCache).

 

JohnS: Same deal.  We must work out what kinds of attributes are required of the data structures/data model to represent temporal decomposition of a dataset.  We should not encode the execution semantics as part of this (it should be outside of the component), but we must ensure that the data interfaces between components are capable of representing this kind of data decomposition/use-case.

 

Wes: Yes

 

 

        * Progressive update and hierarchical/multiresolution algorithms?

 

Randy: Obviously, I have a bias here, particularly in the remote visualization

cases.  Remote implies fluctuations in effective data latency that make

progressive systems key.

 

Pat: A "want".

 

JohnC: This is the way to go (IMHO), the question is at what level to support

it.

 

Jim: Must

 

Ilmi: MUST!  for improving usability and efficiency. And can be used to support

view-dependent algorithm.

 

JohnS: Likewise, we should separate the execution semantics necessary to implement this from the requirements imposed on the data representation.  Data models in existing production data analysis/visualization systems often do not provide an explicit representation for such things as multiresolution hierarchies.  We have LevelOfDetail switches, but that seems to be only a week form of representation for these hierarchical relationships and limits the effectivness of algorithms that depend on this method of data representation.  Those requirements should not be co-mingled with the actual execution semantics for such components (its just the execution interface)

 

Wes: Yes. All of the above. Go team!

 

        * Procedural execution from a single thread of control (ie. using an commandline language like IDL to interactively control an dynamic or large parallel back-end)

 

Randy: Yep, I think this kind of control is key.

 

Pat: A "want".

 

JohnC: A must for data analysis and data manipulation (derving new fields, etc)

 

Jim: This is not an execution model, it is a command/control interface issue.

You should be able to have a GUI, programmatic control, or scripting to

dictate interactive control (or "steering" as they call it... :-).  The

internal software organization shouldn't change, just the interface to

the outside (or inside) world...

 

Ilmi: Good to have

 

JohnS: This should be our primary initial target.  I do not have a good understanding of how best to support this, but its clear that we must ensure that a commandline/interactive scripting language must be supported.  Current data parallel scripting interfaces assume data-parallel, batch-mode execution of the scripting interpreters (this is a bad thing).

 

Wes: HIstorically, this approach has proven to be very useful.

 

 

        * Dataflow execution models?  What is the firing method that should be employed for a dataflow pipeline?  Do you need a central executive like AVS/OpenDX or, completely distributed firing mechanism like that of VTK, or some sort of abstraction that allows the modules to be used with either executive paradigm?

 

Randy: I think this should be an option as it can ease some connection

mechanisms, but it should not be the sole mechanism.  Personally,

I find a properly designed central executive making "global"

decisions coupled with demand/pull driven local "pipelets" that

allow high levels of abstraction more useful (see the VisIt model).

 

Pat: Preferably a design that does not lock us in to one execution model.

 

JohnC: We use a wavelet based approach similar to space filling curves. Both

approaches have merrit and both should be supportable by the framework.

 

Jim: Must This should be an implementation issue in the "dataflow framework", and

should not affect the component-level APIs.

 

JohnS: This can probably be achieved by wrapping components that have explicit/declarative execution semantics in a Òcomponent-within-a-componentÓ hierarchical manner.  Its an open question as to whether these execution models are a function of the component or the framework that is used to compose the components into an application though.

 

Wes: I get stuck thinking about the UI for this kind of thing rather than

the actual implementation. I'll defer to others for opinions.

 

Note: StrangeÉ I would have tagged SFCÕs and Wavelets as ÒresearchyÓ things.

 

        * Support for novel data layouts like space-filling curves?

 

Randy: With the right accessors nothing special needs to be added for these.

 

Pat: Not a pressing need here, as of yet.

 

Jim: Must.  But this isn't an execution model either.  It's a data structure

or algorithmic detail...

 

JohnS: I don't understand enough about such techniques to know how to approach this.  However, it does point out that it is essential that we hand off data structures via accessors  that keep the internal data structures opaque rather than complex data structures.

 

á       Are there special considerations for collaborative applications?

 

Jim: Surely.  The interoperability of distinct framework implementations

ties in with this...  but the components shouldn't be aware that they

are being run collaboratively/remotely...  definitely a framework issue.

 

Ilmi: Some locking mechanizm for subset of data or dispatching of changes from

one client to multiple clients.

 

JohnS: Ugh.  I'm also hoping that collaborative applications only impose requirements for wrapping baseline components rather than imposing internal requirements on the interfaces that exchange data between the components.  So I hope we can have "accessors" or "multiplexor/demultiplexor" objects that connect to essentially non-collaboration-aware components in order support such things.  Otherwise, I'm a bit daunted by the requirements imposed.

 

Note: The danger of pushing the collaborative functionality out to a Òframework issueÓ is that we increasingly make the ÒframeworkÓ a heavyweight object. It creates a high-cost-of-entry for any such feature or even minor modifications to such features. Learned from ÒCactusÓ the importance of making the framework as slender as possible and move as much functionality as possible into Òoptional componentsÓ to support feature X.  So it is important to ensure that we push off these issues as much as possible.

 

        * What else?

 

Randy: The kitchen sink? :)

 

Pat: Distributed control?  Fault tolerance?

 

Jim: Yeah Right.

 

Wes: Control data, performance data and framework response to and manipulation

of such data.

 

How will the execution model affect our implementation of data structures?

 

Jim: It shouldn't.  The execution model should be kept independent of the

data structures as much as possible.

 

If you want to build higher-level APIs for specific data access patterns

that's fine, but keep the underlying data consistent where possible.

 

Note: The description of this as affecting our Òdata structuresÓ is an artifact of attempting to straddle the dual-goals of addressing internal data structures and external accessors.  So perhaps this should be Òhow will it affect our accessors.Ó

 

        * how do you decompose a data structure such that it is amenable to streaming in small chunks?

 

Randy: This is a major issue and relates to things like out-of-core/etc.

I definitely feel that "chunking" like mechanisms need to be in

the core interfaces.

 

Pat: Are we assuming streaming is a requirement?

 

How do you handle visualization algorithms where the access patterns

are not known a priori?  The predominant example: streamlines and streaklines.

Note the access patterns can be in both space and time.  How do you avoid

having each analysis technique need to know about each possible data

structure in order to negotiate a streaming protocol?  How do add another

data structure in the future without having to go through all the analysis

techniques and put another case in their streaming negotiation code?

 

In FM the fine-grained data access ("accessors") is via a standard

interface.  The evaluation is all lazy.  This design means more

function calls, but it frees the analysis techniques from having to know

access patterns a priori and negotiate with the data objects.  In FM

the data access methods are virtual functions.  We find the overhead

not to be a problem, even with relatively large data.  In fact, the overhead

is less an issue with large data because the data are less likely to be

served up from a big array buffer in memory (think out-of-core, remote

out-of-core, time series, analytic meshes, derived fields, differential-

operator fields, transformed objects, etc., etc.).

 

The same access-through-an-interface approach could be done without

virtual functions, in order to squeeze out a little more performance, though

I'm not convinced it would be worth it.  To start with you'd probably end up

doing a lot more C++ templating.  Eliminating the virtual functions would

make it harder to compose things at run-time, though you might be able

to employ run-time compilation techniques a la SCIRun 2.

 

Jim: This sounds a lot like distributed data decompositions.  I suspect that

given a desired block/cycle size, you can organize/decompose data in all

sorts of useful ways, depending on the expected access pattern.

 

In conjunction with this, you could also reorganize static datasets

into filesystem databases, with appropriate naming conventions or

perhaps a special protocol for lining up the data blob files in the

desired order for streaming (in either time or space along any axis).

Meta-data in the files might be handy here, too, if it's indexed

efficiently for fast lookup/searching/selection.

 

JohnS: The recent SDM workshop pointed out that chunking/streaming interfaces are going to be essential for any data analysis system that deals with large data, but there was very little agreement on how the chunking should be expressed.  The chunking also potentially involves end-to-end requirements of the components that are assembled in a pipeline as you must somehow support uniformity in the passage of chunks through the system (ie. the decision you make about the size of one chunk will impose requirements for all other dependent streaming interfaces in the system).  We will need to walk through at least one use-case for chunking/streaming to get an idea of what the constraints are here.  It may be too tough an issue to tackle in this first meeting though.

 

Also, as Pat pointed out, when dealing with vis techniques like streamlines, you almost need to have a demand-based fetching of data.  This implies some automatic propagation of requests through the pipeline.  This will be hard, and perhaps not supported by a baseline procedural model for execution.

 

Note: Again, it appears we need to have clear delineation between temporal and spatial dependencies.  To support streaming, one must also have dependent components be able to report back their constraints. 

 

Jim, how can we formulate a requirement that the execution model is independent of the data structures when we really donÕt have data structures per-se.  Because we are using accessors, calling them will in turn cause a component to call other accessors.  If we do not have common execution semantics, then this will be a complete muddle even if we do agree on our port standards.  So can we really keep these things independent?

 

        * how do you represent temporal dependencies in that model?

 

Randy: I need to give this more thought, there are a lot of options.

 

Pat: In FM, data access arguments have a time value, the field interface is

the same for both static and time-varying data.

 

Jim: Meta-data, or file naming conventions...

 

JohnS: Each item in a datastructure or as passed-through via an accessor needs to have some method of referring to dependencies both spatial (ie. interior boundaries caused by domain decomposition) and temporal.  Its important to make these dependencies explicit in the data structures provide a framework the necessary information to organize parallelism in both the pipeline and data-parallel directions.  The implementation details of how to do so are not well formulated and perhaps out-of-scope for our discussions.  So this is a desired *requirement* that doesn't have a concrete implementation or design pattern involved.

 

Note: Given the importance of time-varying data to JohnC and Pat, it seems important to come up with a formal way to represent these things.

 

 

        * how do you minimize recomputation in order to regenerate data for view-dependent algorithms.

 

Randy: Framework invisible caching.  Not a major Framework issue.

 

Pat: Caching?  I don't have a lot of experience with view-dependent algorithms.

 

Jim: No clue.

 

JohnS: I don't know.  I'm hoping someone else responding to this survey has some ideas on this.  I'm uncertain how it will affect our data model requirements.

 

Note: Is caching a framework issue?  Or is it a component issue?

 

What are the execution semantics necessary to implement these execution models?

        * how does a component know when to compute new data? (what is the firing rule)

 

Randy: Explicit function calls with potential async operation.  A higher-level

wrapper can make this look like "dataflow".

 

Jim: There are really only 2 possibilities I can see - either a component is

directly invoked by another component or the framework, or else a method

must be triggered by some sort of dataflow dependency or stream-based

event mechanism.

 

JohnS: For declarative semantics, the firing rule is an explicit method call that is invoked externally.  Hopefully such objects can be *wrapped* to encode semantics that are more automatic (ie. the module itself decides when to fire depending on input conditions), but initially it should be explicit.

 

Wes: To review, the old AVS model said that a module would be executed if any

of its parameters changed, or if its input data changed. One thing that

was annoying was that you had to explicitly disable the flow executive if

you wanted to make changes to multiple parameters on a single module before

allowing it to execute. This type of thing came up when using a module with

a long execution time.

 

        * does coordination of the component execution require a central executive or can it be implemented using only rules that are local to a particular component.

 

Randy: I think the central executive can be an optional component (again, see

VisIt).

 

Jim: This is a framework implementation detail.  No.  No.  Bad Dog.

The component doesn't know what's outside of it (in the rest of the

framework, or the outside world).  It only gets invoked, one way or

another.

 

JohnS: It can eventually be implemented using local semantics, but intiially, we should design for explicit external control.

 

Wes: Not sure what this means.

 

Note: And it potentially invokes other components.  If a component invokes other components and thereby creates a chain of execution, then we have an execution semantics that is outside of the frameworkÕs control.  So, do we want to prevent this in our baseline requirements for component invocation? The central executive approach says that our ÒbaselineÓ components may not invoke another component in response to their invocation.  This seems to be a component invocation semantics issue.

 

        * how elegantly can execution models be supported by the proposed execution semantics?  Are there some things, like loops or back-propagation of information that are difficult to implement using a particular execution semantics?

 

Randy: There will always be warts...

 

Pat: The execution models we have used have kept the control model in

each analysis technique pretty simple, relying on an external executive.

The one big exception is with multi-threading.  We've experimented with

more elaborate parallelism and load-balancing techniques, motivated in

part by latency hiding desires.

 

Jim: We need to keep the different execution models separate, as implementation

details of individual frameworks.  This separates the concerns here.

 

JohnS: Its all futureware at this point.  We want to first come up with clear rules for baseline component execution and then can come up with some higher level / automatic execution semantics that can be implemented by *wrapping* such components.  The "wrapper" would then take responsibility for imposing higher-level automatic semantics.

 

Wes: The dataflow thing doesn't lend itself well to things like view dependent

processing where the module at the end of the chain (renderer) sends view

parameters back upstream, thereby causing the network to execute again, etc.

The whole upstream data thing is a "wart on the ass of" AVS. (sorry)

 

How will security considerations affect the execution model?

 

Randy: Security issues tend to impact two areas: 1) effective bandwidth/latency

and 2) dynamic connection problems.  1) can be unavoidable, but will not

show up in most environments if we design properly.  2) is a real problem

with few silver bullets.

 

Pat: More libraries to link to?  More latency in network communication?

 

Jim: Ha ha ha ha...

They won't right away, except in collaboration scenarios.

Think "One MPI Per Framework" and do things the old fashioned way

locally, then do the "glue" for inter-framework connectivity with

proper authentication only as needed.  (No worse than Globus... :-)

 

JohnS: I don't know.  Please somebody tell me if this is going to be an issue.  I don't have a handle on the *requirements* for security.  But I do know that simply using a secure method to *launch* a component is considered insufficient by security people who would also require that connections between components be explicitly authenticated as well.  Most vis systems assume secure launching (via SSH or GRAM) is sufficient.  The question is perhaps whether security and authorization are a framework issue or a component issue.  I am hoping that it is the former (the role of the framework that is used to compose the components).

 

Note: Current DOE security policy basically dictates that we cannot deploy current distributed vis tool implementations because the connections are not authenticated.  Ensight is an exception because the server is always making an outgoing connection (basically makes it an issue for the destination site) and requires explicit ÒacceptÓ of the connection.

 

3) Parallelism and load-balancing=================

Thus far, managing parallelism in visualization systems has been a tedious and difficult at best.  Part of this is a lack of powerful abstractions for managing data-parallelism, load-balancing and component control.

 

JohnS: If we are going to address inter-component data transfers to the exclusion of data structures/models internal to the component, then much of this section is moot.  The only question is how to properly represent data-parallel-to-data-parallel transfers and also the semantics for expressing temporal/pipeline parallelism and streaming semantics.  Load-balancing becomes an issue that is out-of-scope because it is effectively something that is inside of components (and we don't want to look inside of the components

 

Please describe the kinds of parallel execution models that must be supported by a visualization component architecture.

á       data-parallel/dataflow pipelines?

 

Jim: Must

JohnS: Must

Wes: It would be nice if the whole scatter/gather thing could be marshaled

by the framework. That way, my SuperSlick[tm] renderer wouldn't contain

a bunch of icky network code that manages multiple socket connections

from an N-way parallel vis component. One interesting problem is how a

persistent tool, like a renderer, will be notified of changes in data

originating from external components. I want some infrastructure that

will make obsolete me having to write custom code like this for each

new project.

 

Note: Seriously.  Is it really a very useful paradigm to have the framework represent parallel components as one-component-per-processor?  Its seems very ÔickyÕ as Wes says.

 

        * master/slave work-queues?

 

Randy: I tend to use small dataflow pipelines locally and higher-level

async streaming work-queue models globally.

 

Jim: Must

 

JohnS: Maybe: If we want to support progressive update or heterogeneous execution environments. However, I usually donÕt consider this methodology scalable.

 

        * streaming update for management of pipeline parallelism?

 

Randy: Yes, we use this, but it often requires a global parallel filesystem to

be most effective.

 

Jim: Must

 

JohnS: Must

 

        * chunking mechanisms where the number of chunks may be different from the number of CPU's employed to process those chunks?

 

Randy: We use spacefilling curves to reduce the overall expense of this

(common) operation (consider the compute/viz impedance mismatch

problem as well).  As a side effect, the codes gain cache coherency

as well.

 

Pat: We're pretty open here.  Mostly straight-forward work-queues.

 

Jim: This sounds the same as master/slave to me, as in "bag of tasks"...

 

JohnS: Absolutely.  Of course, this would possibly be implemented as a master/slave work-queue, but there are other methods.

 

        * how should one manage parallelism for interactive scripting languages that have a single thread of control?  (eg. I'm using a commandline language like IDL that interactively drives an arbitrarily large set of parallel resources.  How can I make the parallel back-end available to a single-threaded interactive thread of control?)

 

Randy: Consider them as "scripting languages", and have most operations

run through an executive (note the executive would not be aware

of all component operations/interactions, it is a higher-level

executive).  Leave RPC style hooks for specific references.

 

Pat: I've used Python to control multiple execution threads.  The (C++)

data objects are thread safe, the minimal provisions for thread-safe

objects in Python haven't been too much of a problem.

 

Jim: Broadcast, Baby...  Either you blast the commands out to everyone SIMD

style (unlikely) or else you talk to the Rank 0 task and the command

gets forwarded on a fast internal network.

 

JohnS: I think the is very important and a growing field of inquiry for data analysis environments.  Whatever agreements we come up with, I want to make sure that things like parallel R are not left out in these considerations.

 

Note: But CCA doesnÕt support broadcast.  This leads to a quandary here because we want to be able to be able to adjust parameters for a component via the GUI or via a command from another component interchangeably.  So I agree with Òbroadcast babyÓ, but I donÕt see that it is feasible to push this off as a Òframework issueÓ as it may well need to be something the component interface description must support. 

 

Please describe your vision of what kinds of software support / programming design patterns are needed to better support parallelism and load balancing.

        * What programming model should be employed to express parallelism.

(UPC, MPI, SMP/OpenMP, custom sockets?)

 

Randy: The programming model must transcend specific parallel APIs.

 

Jim: All but UPC will be necessary for various functionality.

 

JohnS: If we are working just on the outside of components, this question should be moot.  We must make sure the API is not affected by these choices though.

 

Wes: This discussion may follow the same path as the one about DS/DM for grids.

The answer seems to be "one size doesn't fit all, but there is no 'superset'

the makes everyone happy." That said, there is likely a set of common issues

wrt execution and DS/DM that underly parallel components regardless of

implementation.

 

Note: Since we are talking about functionality outside of the component, this seems reasonable.  So this really requires clarification of where the parallelism is expressed.  Caffeine wants to express this as parallel sets of components.  However, this seems unreasonable for some of the communication patterns we deal with.  If we stated that such parallelism is inside of the component wrapper, then what?  (at minimum, we donÕt have to answer this question!)

 

        * Can you give some examples of frameworks or design patterns that you consider very promising for support of parallelism and load balancing.

(ie. PNNL Global Arrays or Sandia's Zoltan)

                http://www.cs.sandia.gov/Zoltan/

                http://www.emsl.pnl.gov/docs/global/ga.html

 

Randy: no I cannot (am not up to speed).

 

Jim: Nope, that covers my list of hopefuls.

 

JohnS: Also out of scope.  This would be something employed within a component, but if we are restricting discussions to what happens on the interface between components, then this is also a moot point.  At minimum, it will be important to ensure that such options will not be precluded by our component interfaces.

 

Wes: Mabye should include "remote resource management" in this thread. I'm

thinking of the remote AVS module libraries. So, not only is there the issue

of launching parallel components, and load balancing (not sure how this will

play out), but also one of allowing a user to select, at run time, from

among a set of resources.

 

This problem becomes even more interesting when the pipeline optimization

starts to happen, and components are migrated across resources.

 

Note: This is now somewhat out-of-scope for discussions of inter-component communication.

 

        * Should we use novel software abstractions for expressing parallelism or should the implementation of parallelism simply be an opaque property of the component? (ie. should there be an abstract messaging layer or not)

 

Randy: I would vote no as it will allow known paradigms to work, but will

interfere with research and new direction integration.  I think some

kind of basic message abstraction (outside of the parallel data system)

is needed.

 

Jim: It's not our job to develop "novel" parallelism abstractions.  We should

just use existing abstractions like what the CCA is developing.

 

JohnS: Implementation of parallelism should be an opaque property of the component.  We want to have language independence.  We should also strive to support independence in the implementation of parallelism.  Creating a software abstraction layer for messaging and shmem is a horrible way to do it.

 

        * How does the NxM work fit in to all of this?  Is it sufficiently differentiated from Zoltan's capabilities?

 

Randy: Unable to comment...

 

Pat: I don't have a strong opinion here.  I'm not familiar with Zoltan et al.

Our experience with parallelism tends to be more shared-memory than

distributed memory.

 

JohnC: Hmm. These all seem to be implementation issues. Too early to answer.

 

JohnS: I need a more concrete understanding of MxN.  I understand what it is supposed to do, but I'm not entirely sure what requirements it would impose on any given component interface implementation.  It seems like something our component data interfaces should support, but perhaps such redistribution could be hidden inside of an MxN component?  So should this kind of redistribution be supported by the inter-component interface or should there be components that explicitly effect such data redistributions?  Jim... Help!

 

Jim: I don't know what Zoltan can do specifically, but MxN is designed for

basic "parallel data redistribution".  This means it is good for doing

big parallel-to-parallel data movement/transformations among two disparate

parallel frameworks, or between two parallel components in the same

framework with different data decompositions.  MxN is also good for

"self-transpose" or other types of local data reorganization within a

given (parallel) component.

 

MxN doesn't do interpolation in space or time (yet, probably for a while),

and it won't wash your car (but it won't drink your beer either... :-).

If you need something fancier, or if you don't really need any data

reorganization between the source and destination of a transfer, then

MxN *isn't* for you...

 

 

 

===============End of Mandatory Section (the rest is voluntary)=============

 

4) Graphics and Rendering=================

What do you use for converting geometry and data into images (the rendering-engine).  Please comment on any/all of the following.

        * Should we build modules around declarative/streaming methods for rendering geometry like OpenGL, Chromium and DirectX or should we move to higher-level representations for graphics offered by scene graphs?

 

Randy: IMHO, the key is defining the boundary and interoperability constraints.

If these can be documented, then the question becomes moot, you can

use whatever works best for the job.

 

Ilmi: It is usually useful to have access to frame buffer so, I prefer OpenGL

style over VRML style.

In addition, I don't know how useful the scene graphs for visualization. I

guess scene graphs for visualizations are relatively simple, so it is

possible to convert the scene graphs to declarative way. So, mainly support

declarative methods and then additional support of scen graphs and

conversions to declarative methods.

 

JohnS: This all depends on the scope of the framework.  A-priori, you can consider the rendering method separable and render this question moot.  However, this will make it quite difficult to provide very sophisticated support for progressive update, image-based-methods, and view-dependent algorithms because the rendering engine becomes intimately involved in such methods.  I'm concerned that this is where the component model might break down a bit.  Certainly the rendering component of traditional component-like systems like AVS or NAG Explorer the most heavy-weight and complex components of the entire environment. Often, the implementation of the rendering component would impose certain requirements on components that had to interact with it closely (particularly in the case of NAG/Iris Explorer where you were really directly exposed to the fact that the renderer was built atop of OpenInventor).

 

So, we probably cannot take on the issue of renderers quite yet, but we are eventually going to need to define a big "component box" around OpenGL/Chromium/DirectX.  That box is going to have to be carefully  built so as to keep from precluding any important functionality that each of those rendering engines can offer.  Again, I wonder if we would need to consider scene graphs if only to offer a persistent datastructure to hand-off to such an opaque rendering engine.  This isn't necessarily a good thing.

 

Wes: As a scene graph proponent, I would say that you don't build component

architectures around scene graphs. That concept doesn't make any sense to me.

Instead, what you do is have DS/DM representations/encapsulations for the

results of visualization. These are things like buckets-o-triangles, perhaps

at multiple resolutions. You also provide the means to send renderer information

to vis components to do view-dependent processing, or some other form of

selective processing.

 

Similarly, you don't make the output of visualization components in the form

of glBegin()/glEnd() pairs, either.

 

Note: It sounds like we need to look at IlmiÕs work and ensure that whatever method we select to get the ÒdrawablesÓ to the ÒrendererÓ that it not preclude her requirements.

 

What are the pitfalls of building our component architecture around scene graphs?

 

Randy: Data cloning, data locking and good support for streaming, view dependent,

progressive systems.

 

JohnC: Not so good for time varying data last time I checked.

 

Ilmi: might lose access to frame buffer and pixel level manipulation --

extremely difficult for view dependent or image-based approach

 

JohnS: It will add greatly to the complexity of this system.  It also may get in the way of novel rendering methods like Image-based methods.

 

Wes: Back to the scene graph issue - what you allow for is composition of streams

of data into a renderer. Since view position information is supported as a

first class DS/DM citizen (right?) it becomes possible to compose a

rendering session that is driven by an external source.

 

Nearly all renderers use scene graph concepts - resistance is futile! The

weak spot in this discussion concerns streaming. Since scene graphs systems

presume some notion of static data, the streaming notion poses some problems.

They can be surmounted by adding some smarts to the rendering and the

data streaming - send over some bounding box info to start with, then allow

the streaming to happen at will. The renderer could either then not render

that tree branch until transmission is complete, or it could go ahead and

render whatever is in there at the time. Middle ground could be achieved

with progressive transmission, so long as there are "markers" that signal

the completion of a finished chunk of data to be rendered.

 

Some people's "complaints" about scene graphs stem from bad designs

and bad implementations. A "scene graph system" is supposed to be

an infrastructure for storing scene data and rendering. That ought to

include support for image-based methods, even though at first blush

it seems nonsensical to talk about buckets-o-triangles in the same

breath as normal maps. All interactive rendering systems are fundamentally

created equally in terms of intent & design. The implementation varies.

Among the top items in the "common" list is the need to store data, the

need to specify a viewpoint, and the need to propogate transformation

information. Beyond that, it's merely an implementation issue.

 

I caution against spending too much time worrying about how scene graphs

fit into DiVA because the issue is largely a red herring.

 

        * What about Postscript, PDF and other scale-free output methods for publication quality graphics?  Are pixmaps sufficient?

 

Randy: Gotta make nice graphs.  Pixmaps will not suffice.

 

JohnC: Well what are we trying to provide, an environment for analysis or

producing images for publications? The latter can be done as a post

process and should not, IMHO, be a focus of DIVA.

 

JohnS: Pixmaps are insufficient.  Our data analysis infrastructure has been moving rapidly away from scale-free methods and rapidly towards pixel-based methods.  I don't know how to stop this slide or if we are poised to address this issue as we look at this component model.

 

Wes: Gotta have vector graphics.

 

In a distributed environment, we need to create a rendering subsystem that can flexibly switch between drawing to a client application by sending images, sending geometry, or sending geometry fragments (image-based rendering)?  How do we do that?

 

Randy: See the Chromium approach.  This is actually more easily done than

one might think.  Define an image "fragment" and augment the rendering

pipeline to handle it (ref: PICA and Chromium).

 

JohnC: Use Cr

 

Jim: I would think this could be achieved by a sophisticated data communication

protocol - one that encodes the type of data in the stream, say, using XML

or some such thingy.

 

Wes: Again, one size doesn't fit all. These seem to be logically different components.

 

Note: So we are going to define the OpenGL/Cr API as a ÒportÓ interface in CCA?  All of GL will go through RMI?

 

* Please describe some rendering models that you would like to see supported (ie. view-dependent update, progressive update) and how they would adjust dynamically do changing objective functions (optimize for fastest framerate, or fastest update on geometry change, or varying workloads and resource constraints).

 

Randy: See the TeraScale browser system.

 

JohnC: Not sold on view dependent update as worthwhile, but progressive updates

can be hugely helpful. Question is do you accomplish this by adding

support in the renderer or back it up the pipeline to the raw data?

 

JohnS: I see this as the role for the framework.  It also points to the need to have performance models and performance monitoring built in to every component so that the framework has sufficient information to make effective pipeline deployment decisions in response to performance constraints.  It also points to the fact that at some level in this component architecture, component placement decisions must be entirely abstract (but such a capability is futureware).

 

So in the short-term its important to design components with effective interfaces for collecting performance data and representing either analytic or historical-based models of that data.  This is a necessary baseline to get to the point that a framework could use such data to make intelligent deployment/configuration decisions for a distributed visualization system.

 

Wes: The scene graph treatise (above) covers most of what I have to say for now.

 

Note: Randy gets to present the features of the TeraScale browser system

 

        * Are there any good examples of such a system?

 

Randy: None that are ideal :), but they are not difficult to build.

 

JohnC: Yes, Kitware's not-for-free volume renderer (volren?). I does a nice job

with handling progressive updates. This is mostly handled by the GUI but

places some obvious requirements on the underlying rendering/viz

component.

 

JohnS: No.  ThatÕs why we are here.

 

Wes: I know of a couple of good scene graphs that can form the basis for renderers.

 

What is the role of non-polygonal methods for rendering (ie. shaders)?

á       Are you using any of the latest gaming features of commodity cards in your visualization systems today?

 

JohnC: Yup, we've off loaded a couple of algorithms from the CPU.

We just have some very simple, one-off applications that off-load

computation from the cpu to gpu. For example, we have a 2D Image Based

Flow Visualization algorithm that exploits vertex programmability to do

white noise advection. Developing this type of application within

any Diva framework I've envisioned would really push the limits of

anything we've discussed.

 

JohnS: I'd like to know if anyone is using shader hardware.  I don't know much about it myself, but it points out that we need to plan for non-polygon-based visualization methods.  Its not clear to me how to approach this yet.

 

        * Do you see this changing in the future? (how?)

 

Randy: This is a big problem area.  Shaders are difficult to combine/pipeline.

We are using this stuff now and I do not see it getting much easier

(hlsl does not fix it).  At some point, I believe that non-polygon

methods will become more common that polygon methods (about 3-4 years?).

Poylgons are a major bottleneck on current gfx cards as they limit

parallelism.  I'm not sure what the fix will be but it will still be

called OpenGL :).

 

JohnC: The biggest issue is portability, but things are looking up with OpenGL

2.0 efforts, etc.

 

Wes: We've invited Ilmi Yoon to the next workshop. She represents the IBR

community. I am very keen to see us take advantage of IBR techniques as well

as our traditional polygon engines, perhaps combining them in interesting

ways to realize powerful new systems.

 

 

Note: Do scene graphs somewhat address portability issues via further abstraction of the rendering procedure?

 

5) Presentation=========================

It will be necessary to separate the visualization back-end from the presentation interface.  For instance, you may want to have the same back-end driven by entirely different control-panels/GUIs and displayed in different display devices (a CAVE vs. a desktop machine).   Such separation is also useful when you want to provide different implementations of the user-interface depending on the targeted user community.  For instance, visualization experts might desire a dataflow-like interface for composing visualization workflows whereas a scientists might desire a domain-specific dash-board like interface that implements a specific workflow.  Both users should be able to share the same back-end components and implementation even though the user interface differs considerably.

 

How do different presentation devices affect the component model?

 

Jim: Not.  The display device only affects resolution or bandwidth required.

This could be parameterized in the component invocations APIs, but

should not otherwise change an individual component.

 

If you want a "multiplexer" to share a massive data stream with a powerwall

and a PDA, then the "multiplexer component" implementation handles that...

 

        * Do different display devices require completely different user interface paradigms?  If so, then we must define a clear separation between the GUI description and the components performing the back-end computations.  If not, then is there a common language to describe user interfaces that can be used across platforms?

 

Randy: I think they do (e.g. immersion).

 

Jim: No.  Different GUIs should all map to some common framework command/control

interface.  The same functions will ultimately get executed, just from buttons

with different labels or appl-specific short-cuts...  The UIs should all be

independent, but talk the same protocol to the framework.

 

Yuk (with regard to creating separation between GUI and component description)

 

JohnS: Systems that attempt to use the same GUI paradigm across different presentation media have always been terrible in my opinion.  I strongly believe that each presentation medium requires a GUI design that is specific to that particular medium.  This imposes a strong requirement that our compute pipeline for a given component architecture be strictly separated from the GUI that controls the parameters and presents the visual output of that pipeline.  OGSA/WSDL has been proposed as one way to define that interface, but it is extremely complex to use.  One could use CCA to represent the GUI handles, but that might be equally complex.  Others have simply customized ways to use XML descriptions of their external GUI interface handles for their components.  The latter seems much simpler to deal with, but is it general enough?

 

        * Do different display modalities require completely different component/algorithm implementations for the back-end compute engine?

(what do we do about that??)

 

Randy: They can (e.g. holography), but I do not see a big problem there.

Push the representation through an abstraction (not a layer).

 

Jim: Algorithm maybe, component no.  This could fall into the venue of the

different execution-model-specific frameworks and/or their bridging...

I dunno.

 

JohnS: I think there is a lot of opportunity to share the back-end compute engines across different display modalities.  There are some cases where a developer would be inclined to implement things like an isosurfacer differently for a CAVE environment just to keep the framerates up high-enought to maintain your sense of immersion.  However, I think of those as edge-cases.

 

What Presentation modalities do you feel are important and what do you consider the most important.

        * Desktop graphics (native applications on Windows, on Macs)

 

Randy: #1 (by a fair margin)

JohnC: This is numero uno by a HUGE margin

Jim: MUST

JohnS: #1

Wes: Yes, most important, will never go away.

 

        * Graphics access via Virtual Machines like Java?

 

Randy: #5

JohnC: Not important

Jim: Ha ha ha haÉ

JohnS: #5

Wes: If it works on desktops, it will work in these environments.

 

        * CAVEs, Immersadesks, and other VR devices

 

Randy: #4

JohnC: Not important

Jim: Must

JohnS: #4

Wes: Second to workstations. With evolution of Chromium, DMX and the nascent

PICA stuff, I would expect that desktop tools would port transparently

to these devices.

 

        * Ultra-high-res/Tiled display devices?

 

Randy: #3 - note that tiling applies to desktop systems as well, not

necessarily high-pixel count displays.

JohnC: Moderately important

Jim: Must

JohnS: #3 : the next tiled display may well be your next *desktop* display, but quite yet.

 

        * Web-based applications?

 

Randy: #2

JohnC: Well, maybe.

Jim: Probably a good idea.  Someone always asks for this...  :-Q

JohnS: #2

 

What abstractions do you think should be employed to separate the presentation interface from the back-end compute engine?

 

Jim: Some sort of general protocol descriptor, like XML...?  Nuthin fancy.

 

        * Should we be using CCA to define the communication between GUI and compute engine or should we be using software infrastructure that was designed specifically for that space? (ie. WSDL, OGSA, or CORBA?)

 

Randy: No strong opinion.

 

Jim: The CCA doesn't do such communication per se.  Messaging between or in/out

of frameworks is always "out of band" relative to CCA port invocations.

 

If the specific framework impl wants to shove out data on some wire,

then it's hidden below the API level...

 

I would think that WSDL/SOAP would be O.K. for low-bandwidth uses.

 

JohnS: I think I addressed this earlier.  We can do this all in CCA, but is that the right thing to do?  I know this is an implementation issue, but is a strong part of our agreement on methods to implement our components (or define component boundaries).

 

Wes: (I this as similar to rendering in VMÕs like Java in many respects).

Always sounds nice, but have

yet to see much fruit in this area. The potential importance/relevance is great.

The browser makes a nice UI engine, but I wouldn't trust it to do "real"

rendering.

 

        * How do such control interfaces work with parallel applications?

Should the parallel application have a single process that manages the control interface and broadcasts to all nodes or should the control interface treat all application processes within a given component as peers?

 

Randy: Consider DMX, by default, single w/broadcast, but it supports

backend bypass...

 

Jim: I vote for the "single process that manages the control interface and

broadcasts to all nodes" (or the variation above, where one of the

parallel tasks forwards to the rest internally :-).  The latter is

not scalable.

 

BTW, you can't have "application processes within a... component".

What does that even mean?

 

Usually, an application "process" consists of a collection of one or

more components that have been composed with some specific connectivity...

 

JohnS: This requires more discussion, but reliable broadcast methods have many problems related to event skewing and MPI-like point-to-point emulation of the broadcast suffers from scalability problems.  We need to collect design patterns for the control interface and either compete them against one-another or find a way to support them all by design.  This is clearly an implementation issue, but will leak in to our abstract component design decisions.   Clearly we want a single thread of control to efficiently deliver events to massively parallel back-end components. That is a *must* requirement.

 

Note: That paradigm (one component-chain per process) doesnÕt offer you much opportunity for encapsulating complex parallel communication patterns.

 

6) Basic Deployment/Development Environment Issues============

One of the goals of the distributed visualization architecture is seamless operation on the Grid -- distributed/heterogeneous collections of machines.  However, it is quite difficult to realize such a vision without some consideration of deployment/portability issues.  This question also touches on issues related to the development environment and what kinds of development methods should be supported.

 

What languages do you use for core vis algorithms and frameworks.

        * for the numerically intensive parts of vis algorithms

 

Randy: C/C++ (a tiny amount of Fortran)

JohnC: C/C++

Jim: C/C++É  Fortran/F90 for numerically intensive parts.

JohnS: C/C++/Fortran

Wes: C/C++

 

        * for the glue that connects your vis algorithms together into an application?

 

Randy: C/C++

JohnC: C/C++, Tcl, Python

Jim: C/C++

JohnS: C++/C/Java, but want to get into some Python (it is said to have better numerics than Java)

Wes: C/C++

 

        * How aggressively do you use language-specific features like C++ templates?

 

Randy: Not very, but they are used.

JohnC: Not at all. Too scary.

Jim: RUN AWAYYYY!!!  These are not consistent across o.s./arch/compiler yet.

Maybe someday...

JohnS: I avoid them due to portability and compiler maturity issues.

Wes: Beond vanilla classes, not at all.

 

        * is Fortran important to you?  Is it important that a framework support it seamlessly?

 

Randy: Pretty important, but at least "standardly" enhanced F77 should be simple :).

JohnC: Nope

Jim: Fortran is crucial for many application scientists.  It is not directly

useful for the tools I build.

But if you want to ever integrate application code components directly

into a viz framework, then you better not preclude this... (or Babel...)

JohnS: Yes, absolutely.  It needn't be full fledged F90 support, but certainly f77 with some f90 extensions.

Wes: No. Fortran can be wrapped inside something sane.

 

Note:  It is perhaps incumbent on us to support Fortran.  We would eventually like buy-in from domain scientists to provide some analysis components that are interesting for them.  Lack of Fortran bindings for VTK was a major issue for some participants in the Vis Greenbook workshop.

 

        * Do you see other languages becoming important for visualization (ie. Python, UPC, or even BASIC?)

 

Randy: Python is big for us.

JohnC: Python, mostly because the direction of numerical python.

Jim: Nope.

JohnS: Python

 

What platforms are used for data analysis/visualization?

        * What do you and your target users depend on to display results? (ie. Windows, Linux, SGI, Sun etc..)

 

Randy: Linux, SGI, Sun, Windows, MacOS in that order

JohnC: All the above, primarily lintel an windoze though.

Jim:  All of the above (not so much Sun anymoreÉ)

JohnS: Linux, MacOS-X(BSD), Windows

Wes: For rendering, OpenGL engines.

 

        * What kinds of presentation devices are employed (desktops, portables, handhelds, CAVEs, Access Grids, WebPages/Collaboratories) and what is their relative importance to active users.

 

Randy: Remote desktops and laptops.  Very important

JohnC: desktops, tiled displays, AG

Jim: All but handhelds are important, mostly desktops, CAVEs/hi-res and AG,

in decreasing order.

JohnS: Desktop and laptops are most important.  Web, AG, and CAVE are of lesser importance (but still important).

Wes: Workstations are most important.

 

        * What is the relative importants of these various presentation methods from a research standpoint?

 

Randy: PowerPoint :)?

JohnC: The desktop is where the users live.

Jim: CAVEs/hi-res and AG are worthwhile research areas.  The rest can be

weaved in or incorporated more easily.

 

        * Do you see other up-and-coming visualization platforms in the future?

 

Randy: Tablets & set-top boxes.

JohnC: I don't see SMP graphics boxes going away as quickly as some might.

Jim: Yes, but I haven't figured out where exactly to stick the chip behind

my ear for the virtual holodeck equipment...  :)

JohnS: Tablet PCs and desktop-scale Tiled display devices.

 

Tell us how you deal with the issue of versioning and library dependencies for software deployment.

        * For source code distributions, do you bundle builds of all related libraries with each software release (ie. bundle HDF5 and FLTK source with each release).

 

Randy: For many libs, yes.

 

JohnC: Sometimes, depending on the stability of the libraries.

 

Jim: CVS for control of versioning.

For bundling of libraries: No, but provide web links or separate copies of dependent distributions

next to our software on the web site...

Too ugly to include everything in one big bundle, and not as efficient

as letting the user download just what they need.  (As long as everything

you need is centrally located or accessible...)

 

JohnS: Every time I fail to bundle dependent libraries, it has been a disaster.  So it seems that packaging dependent libraries with any software release is a *must*.

 

Wes: Oddly enough, I do bundling like this for some of my projects. I think people

appreciate it.

 

        * What methods are employed to support platform independent builds (cmake, imake, autoconf).  What are the benefits and problems with this approach.

 

Randy: gmake based makefiles.

 

JohnC: I've used all, developed my own, and like none. Maybe we can do better.

I think something based around gmake might have the best potential.

 

Jim: Mostly autoconf so far.  My student thinks automake and libtools is "cool"

but we haven't used them yet...

 

JohnS: I depend on conditional statements in gmake-based makefiles to auto-select between flags for different architectures.  This is not sufficiently sophisticated for most release engineering though.  I have dabbled with autoconf, but it is not a silver bullet (neither was imake).  I do not understand the practical benefits of 'cmake'.

 

Wes: I hate Imake, but used it extensively for a long time with LBL's AVS modules.

I think it still works. Nobody I know can figure out how autoconf works. I

personally tend to have different makefiles, particularly when doing code

that is supposed to build on Win32 as well as Unix/Linux systems.

 

        * For binaries, have you have issues with different versions of libraries (ie. GLIBC problems on Linux and different JVM implemetnations/version for Java).  Can you tell us about any sophisticated packaging methods that address some of these problems (RPM need not apply)

 

Randy: No real problems other that GLIBC problems.  We do tend to ship static

for several libs.  Motif used to be a problem on Linux (LessTiff vs

OpenMotif).

 

Jim: Just say no.  Open Source is the way to go, with a small set of "common"

binaries just for yuks.  Most times the binaries won't work with the

specific run-time libs anyway...

 

JohnS: Building statically has been necessary in a lot of cases, but creates gigantic executables.  In the case of JVM's, the problems with the ever-changing Java platform have driven me away from employing Java as a development platform.

 

Wes: I tend to just do source, rather than binaries, to avoid this whole morass.

OTOH, as a consumer, I prefer RPMs so that I don't have to build it. I want

my ice toasted, please.

 

        * How do you handle multiplatform builds?

 

Randy: cron jobs on multiple platforms, directly from CVS repos.   Entire

environment can be built from CVS repo info (or cached).

 

JohnC: The brute force, not so smart way. The VTK model is worth looking at.

 

Jim: Autoconf, shared source tree, with arch-specific subdirs for object files,

libs and executables.

 

JohnS: * Conservative, lowest-common denominator coding practices.

                 * execute 'uname' at the top of a gnu makefile to select an appropriate set of build options for sourcecode building.  Inside of the code, must use the CPP to code around platoform dependencies.

 

How do you (or would you) provide abstractions that hide the locality of various components of your visualization/data analysis application?

 

Jim: I would use "proxy" components that use out-of-band communication to

forward invocations and data to the actual component implementation.

 

á       Does anyone have ample experience with CORBA, OGSA, DCOM, .NET, RPC?  Please comment on advantages/problems of these technologies.

Jim: Nope

JohnS: Nope

 

        * Do web/grid services come into play here?

 

Randy: Not usually an issue for us.

Jim: Yuck, I hope notÉ

JohnS: As these web-based scientific collaboratory efforts gather momentum, web-based data analysis tools have become increasingly important.  I think the motivation is largely driven by deployment issues when supporting a very heterogeneous/multi-institutional user base.  It reduces the deployment variables when your target is a specific web-server environment, but you pay a price in that the user-interface is considerably less advanced.  This cost is mitigated somewhat if the data analysis performed is very domain-specific and customized for the particular collaboratory community.  So its a poor choice for general-purpose visualization tools, but if the workflow is well-established among the collaborators, then the weakness of the web-based user-interface options is not as much of a problem.

 

 

7) Collaboration ==========================

If you are interested in "collaborative appllications" please define the term "collaborative".  Perhaps provide examples of collaborative application paradigms.

 

Randy: Meeting Maker? :) :)  (I'm getting tired).

 

Jim: "Collaborative" is 2 or more geographically/remote teams, sharing one

common viz environment, with shared control and full telepresence.

(Note: by this definition, "collaborative" does not yet exist... :-)

 

JohnS: Despite years of dabbling in Òcollaborative applications,Ó IÕm still not sure if I (or anyone) really knows what ÒcollaborativeÓ is in a strict sense.

 

Wes: The term "collaboration" is one of the most overused, misused and abused

terms in the English language. There is a huge disconnect between what

many users want/need, and what seems to be an overemphasis upon collaborative

technologies. For this particular project, collaboration (ought to) mean:

being able to share software components; and some level of confidence that

"DiVA-compliant" components in fact do interoperate. For the sake of

discussion, let's call this type of collaboration "interoperability."

 

For the other forms of "collaboration," care must be taken to define what

they are, whether they are useful, etc. If you're talking about multiple

persons seeing the same interactive renderer output, and each person being

able to do some interactive transformation, let's call that form of

collaboration "MI" (multiperson-interactive).

 

I recall hearing some discussion about the relationship between the AG

and DiVA. From my perspective, the AG ought to provide support to allow

any application to run in a "MI mode" With this perspective,

there isn't really much to talk about in terms of fundamental DiVA

design wrt "MI."

 

Is collaboration a feature that exists at an application level or are there key requirements for collaborative applications that necessitate component-level support?

á       Should collaborative infrastructure be incorporated as a core feature of very component?

 

JohnC: Does it need to be incorporated in all components? What kind of collab

support is needed? Permitting session logging and geographically

separated, simultaneous, users  would go a long way to providing for

collab needs and would seem to only impact the GUI and perhaps renderer.

 

Jim: Collaboration should exist *above* the application level, either outside

the specific framework or as part of the framework "bridging" technology.

 

JohnS: No.  I hope that support for collaborative applications can be provided via supplemental components.

 

Wes: I don't know what "collaborative infrastructure" means. Given that my position

(above), "MI" is more of a framework thing, and not a component thing.

This seems to be the most realistic approach to "MI."

 

Note:  IÕm not sure how to interpret this answer.  Is this a Òframework issueÓ or a Òcomponent issueÓ or is it totally outside of the application?  So we retrofit applications to be ÒcollaborativeÓ from the outside rather than designing apps or the frameworks that implement them to support collaboration ÒrequirementsÓ as a fundamental feature of the technology?

 

á       Can any conceivable collaborative requirement be satisfied using a separate set of modules that specifically manage distribution of events and data in collaborative applications?

 

Jim: I dunno, I doubt it.

 

JohnS: That is what I hope.

 

á       How is the collaborative application presented?  Does the application only need to be collaborative sometimes?

 

Jim: Yes, collaboration should be flexible and on demand as needed - like

dialing out on the speakerphone while in the middle of a meeting...

JohnS: This is probably true.  You probably want to be able to have tools that were effectively standalone that can join into a collaborative space on demand.

 

á       Where does performance come in to play?  Does the visualization system or underlying libraries need to be performance-aware?

 (i.e. I'm doing a given task and I need a framerate of X for it to be useful using my current compute resources), network aware (i.e. the system is starving for data and must respond by adding an alternate stream or redeploying the pipeline).  Are these considerations implemented at the component level, framework level, or are they entirely out-of-scope for our consideration?

 

Jim: There likely will need to be "hooks" to specify performance requirements,

like "quality of service".  This should perhaps be incorporated as part

of the individual component APIs, or at least metered by the frameworks...

It would be wise to specify the frame rate requirement, perhaps interactively

depending on the venue...  e.g. in interactive collaboration scenarios you'd

rather drop some frames consistently than stall completely or in bursts...


This sounds like futureware to me - an intelligent network protocol layer...

beyond our scope for sure!

 

These issues should be dealt with mostly at the framework level, if at all.

I think they're mostly out-of-scope for the first incarnation...

 

JohnS: Yes.  The whole collaboration experience will fall apart if you cannot impose some constraints on quality of service or react appropriately to service limitations.  Its a big problem, but I hope the solution does not need to be a fundamental feature of the baseline component design.

 

Wes: The MI-aware framework collects and uses performance data generated by

components to make decisions about how to tune/optimize visualization

pipeline performance (the pipeline consists of a bunch of components).

 

If some of the other issues I've raised are addressed (e.g., time-limited

execution, partial processing, incremental processing, etc), then the

performance issues raised within the context of MI come "for free".

 

Note: The issue here is that if anyone thinks this should be done at anything other than the framework (or even outside of framework) level, then it could be very disruptive to our design process if we develop first for single-user operation and then later attempt to make Òcollaborative servicesÓ a requirement.  Implementing this at the Òframework levelÓ is again a high-price-for-admission.  If there is any way to support this at a component level, it would enable people working on collaborative extensions to share better with people who have different aims for their framework.  I donÕt consider it a benefit to have one ÒframeworkÓ per  use-case as has been the practice in many aspects of CCA.  It will just continue the balkanization of our development efforts.