From: John Shalf <jshalf@lbl

From: John Shalf <jshalf@lbl.gov>

Date: Wed Sep 10, 2003 11:53:29 AM US/Pacific

To: diva@lbl.gov

Subject: Re: DiVA Survey (Please return by Sept 10!)

OK,

here are my responses to the mandatory portion of survey.

I'll send the voluntary section separately.

On Wednesday, August 27, 2003, at 03:33 PM, John Shalf wrote:

=============The Survey=========================

Please answer the attached survey with as much or as little verbosity as you please and return it to me by September 10. The survey has 3 mandatory sections and 4 voluntary (bonus) sections. The sections are as follows;

Mandatory;

1) Data Structures

2) Execution Model

3) Parallelism and Load-Balancing

Voluntary;

4) Graphics and Rendering

5) Presentation

6) Basic Deployment and Development Environment Issues

7) Collaboration

We will spend this workshop focusing on the first 3 sections, but I think we will derive some useful/motivating information from any answers to questions in the voluntary sections.

I'll post my answers to this survey on diva mailing list very soon. You can post your answers publicly if you want to, but I am happy to regurgitate your answers as "anonymous contributors" if it will enable you to be more candid in your evaluation of available technologies.

1) Data Structures/Representations/Management==================

The center of every successful modular visualization architecture has been a flexible core set of data structures for representing data that is important to the targeted application domain. Before we can begin working on algorithms, we must come to some agreement on common methods (either data structures or accessors/method calls) for exchanging data between components of our vis framework.

There are two potentially disparate motivations for defining the data representation requirements. In the coarse-grained case, we need to define standards for exchanging data between components in this framework (interoperability). In the fined-grained case, we want to define some canonical data structures that can be used within a component -- one developed specifically for this framework. These two use-cases may drive different set of requirements and implementation issues.

* Do you feel both of these use cases are equally important or should we focus exclusively on one or the other?

While I am very interested in design patterns, data structures, and services that could make the design of the interior of parallel/distributed components easier, it is clear that the interfaces between components are the central focus of this project. So the definition of inter-component data exchanges is preeminent.

* Do you feel the requirements for each of these use-cases are aligned or will they involve two separate development tracks? For instance, using "accessors" (method calls that provide abstract access to essentially opaque data structures) will likely work fine for the coarse-grained data exchanges between components, but will lead to inefficiencies if used to implement algorithms within a particular component.

Given the focus on inter-component data exchange, I think accessors provide the most straightforward paradigm for data exchange. The arguments to the data access methods can involve elemental data types rather than composite data structures (eg. we use scalars and arrays of basic machine data types rather than hierarchical structures). Therefore we should look closely at FM's API organization as well as the accessors employed by SCIRun V1 (before they employed dynamic compilation).

The accessor method works well for abstracting component location, but requires potentially redundant copying of data for components in the same memory space. It may be necessary to use reference counting in order to reduce the need to recopy data arrays between co-located components, but I'd really like to avoid making ref counting a mandatory requirement if we can avoid it. (does anyone know how to avoid redundant data copying between opaque components without employing reference counting?)

What are requirements for the data representations that must be supported by a common infrastructure. We will start by answering Pat's questions of about representation requirements and follow up with personal experiences involving particular domain scientist's requirements.

Must: support for structured data

Must

Must/Want: support for multi-block data?

Must

Must/Want: support for various unstructured data representations? (which ones?)

Cell based initially. Arbitrary connectivity eventually, but not manditory.

Must/Want: support for adaptive grid standards? Please be specific about which adaptive grid methods you are referring to. Restricted block-structured AMR (aligned grids), general block-structured AMR (rotated grids), hierarchical unstructured AMR, or non-hierarchical adaptive structured/unstructured meshes.

If we can define the data models rigorously for the individual grid types (ie. structured and unstructured data), then adaptive grid standards really revolve around an infrastructure for indexing data items. We normally think of indexing datasets by time and by data species. However, we need to have more general indexing methods that can be used to support concepts of spatial and temporal relationships. Support for pervasive indexing structures is also important for supporting other visualization features like K-d trees, octrees, and other such methods that are used to accelerate graphics algorithms. We really should consider how to pass such representations down the data analysis pipeline in a uniform manner because they are used so commonly.

Must/Want: "vertex-centered" data, "cell-centered" data? other-centered?

Must understand all centering (particularly for structured grids where vis systems are typically lax in storing/representing this information).

Must: support time-varying data, sequenced, streamed data?

Yes to all. However, the concept of streamed data must be defined in more detail. This is where the execution paradigm is going to affect the data structures.

Must/Want: higher-order elements?

Not yet.

Must/Want: Expression of material interface boundaries and other special-treatment of boundary conditions.

Yes, we must treat ghost zones specially or parallel vis algorithms will create significant artifacts. I'm not sure what is required for combined air-ocean models.

* For commonly understood datatypes like structured and unstructured, please focus on any features that are commonly overlooked in typical implementations. For example, often data-centering is overlooked in structured data representations in vis systems and FEM researchers commonly criticize vis people for co-mingling geometry with topology for unstructured grid representations. Few datastructures provide proper treatment of boundary conditions or material interfaces. Please describe your personal experience on these matters.

There is little support for non-cartesian coordinate systems in typical data structures. We will need to have a discussion of how to support coordinate projections/conversions in a comprehensive manner. This will be very important for applications relating to the National Virtual Observatory.

* Please describe data representation requirements for novel data representations such as bioinformatics and terrestrial sensor datasets. In particular, how should we handle more abstract data that is typically given the moniker "information visualization".

I simply don't know enough about this field to comment.

What do you consider the most elegant/comprehensive implementation for data representations that you believe could form the basis for a comprehensive visualization framework?

* For instance, AVS uses entirely different datastructures for structure, unstructured and geometry data. VTK uses class inheritance to express the similarities between related structures. Ensight treats unstructured data and geometry nearly interchangably. OpenDX uses more vector-bundle-like constructs to provide a more unified view of disparate data structures. FM uses data-accessors (essentially keeping the data structures opaque).

Since I'm already on record as saying that opaque data accessors are essential for this project, it is clear that FM offers the most compelling implementation that satisfies this requirement.

* Are there any of the requirements above that are not covered by the structure you propose?

We need to be able to express a wider variety of data layout conversions and have some design pattern that reduces the need to recopy data arrays for local components. The FM model also needs to have additional API support for hierarchical indices to accelerate access to subsections of arrays or domains.

* Is there information or characteristics of particular file format standards that must percolate up into the specific implementation of the in-memory data structures?

I hope not.

For the purpose of this survey, "data analysis" is defined broadly as all non-visual data processing done *after* the simulation code has finished and *before* "visual analysis".

* Is there a clear dividing line between "data analysis" and "visual analysis" requirements?

There shouldn't be. However, people at the SRM workshop left me with the impression that they felt data analysis had been essentially abandoned by the vis community in favor or "visual analysis" methods. We need to undo this.

* Can we (should we) incorporate data analysis functionality into this framework, or is it just focused on visual analysis.

Vis is bullshit without seamless integration with flexible data analysis methods. The most flexible methods available are text-based. The failure to integrate more powerful data analysis features into contemporary 3D vis tools has been a serious problem.

* What kinds of data analysis typically needs to be done in your field? Please give examples and how these functions are currently implemented.

This question is targeted at vis folks that have been focused on a particular scientific domain. For general use, I think of IDL as being one of the most popular/powerful data analysis languages. Python has become increasingly important -- especially with the Livermore numerical extensions and the PyGlobus software. However, use of these scripting/data analysis languages have not made the transition to parallel/distributed-memory environments (except in a sort of data-parallel batch mode).

* How do we incorporate powerful data analysis functionality into the framework?

I'm very interested in work that Nagiza Samatarova has proposed for a parallel implementation of the R statistics language. The traditional approach for parallelizing scripting languages is to run them in a sort of MIMD mode of Nprocs identical scripts operating on different chunks of the same dataset. This makes it difficult to have a commandline/interactive scripting environment. I think Nagiza is proposing to have an interactive commandline environment that transparently manipulates distributed actions on the back-end.

There is a similar work in progress on parallel matlab at UC Berkeley. Does anyone know of such an effort for Python? (most of the parallel python hacks I know of are essentially MIMD which is not very useful).

2) Execution Model=======================

It will be necessary for us to agree on a common execution semantics for our components. Otherwise, while we might have compatible data structures but incompatible execution requirements. Execution semantics is akin to the function of protocol in the context of network serialization of data structures. The motivating questions are as follows;

* How is the execution model affected by the kinds of algorithms/system-behaviors we want to implement.

* How then will a given execution model affect data structure implementations

There will need to be some way to support both declarative execution semantics, data-driven and demand-driven semantics. By declarative semantics, I mean support for environments that want to be in control of when the component "executes" or interactive scripting environments that wish to use the components much like subroutines. This is separate from the demands of very interactive use-cases like view-dependent algorithms where the execution semantics must be more automatic (or at least hidden from the developer who is composing the components into an application). I think this is potentially relevant to data model discussions because the automatic execution semantics often impose some additional requirements on the data structures to hand off tokens to one another. There are also issues involved with managing concurrent access to data involved. For instance, a demand-driven system demanded of progressive-update or view-dependent algorithms, will need to manage the interaction between the arrival of new data and asynchronous requests from the viewer to recompute existing data as the geometry is rotated.

* How will the execution model be translated into execution semantics on the component level. For example will we need to implement special control-ports on our components to implement particular execution models or will the semantics be implicit in the way we structure the method calls between components.

I'm going to propose that we go after the declarative semantics first (no automatic execution of components) with hopes that you can wrap components that declare such an execution model with your own automatic execution semantics (whether it be a central executive or a distributed one). This follows the paradigm that was employed for tools such as VisIt that wrapped each of the pieces of the VTK execution pipeline so that it could impose its own execution semantics on the pipeline rather than depending on the exec semantics that were predefined by VTK. DiVA should follow this model, but start with the simplest possible execution model so that it doesn't need to be deconstructed if it fails to meet the application developer's needs (as was the case with VisIt).

We should have at least some discussion to ensure that the *baseline* declarative execution semantics imposes the fewest requirements for component development but can be wrapped in a very consistent/uniform/simple manner to support any of our planned pipeline execution scenarios. This is an excercise in making things as simple as possible, but thinking ahead far enough about long-term goals to ensure that the baseline is "future proof" to some degree.

What kinds of execution models should be supported by the distributed visualization architecture

* View dependent algorithms? (These were typically quite difficult to implement for dataflow visualization environments like AVS5).

Must be supported, but not as a basline exec model.

* Out-of-core algorithms

Same deal. We must work out what kinds of attributes are required of the data structures/data model to represent temporal decomposition of a dataset. We should not encode the execution semantics as part of this (it should be outside of the component), but we must ensure that the data interfaces between components are capable of representing this kind of data decomposition/use-case.

* Progressive update and hierarchical/multiresolution algorithms?

Likewise, we should separate the execution semantics necessary to implement this from the requirements imposed on the data representation. Data models in existing production data analysis/visualization systems often do not provide an explicit representation for such things as multiresolution hierarchies. We have LevelOfDetail switches, but that seems to be only a week form of representation for these hierarchical relationships and limits the effectivness of algorithms that depend on this method of data representation. Those requirements should not be co-mingled with the actual execution semantics for such components (its just the execution interface)

* Procedural execution from a single thread of control (ie. using an commandline language like IDL to interactively control an dynamic or large parallel back-end)

This should be our primary initial target. I do not have a good understanding of how best to support this, but its clear that we must ensure that a commandline/interactive scripting language must be supported. Current data parallel scripting interfaces assume data-parallel, batch-mode execution of the scripting interpreters (this is a bad thing).

* Dataflow execution models? What is the firing method that should be employed for a dataflow pipeline? Do you need a central executive like AVS/OpenDX or, completely distributed firing mechanism like that of VTK, or some sort of abstraction that allows the modules to be used with either executive paradigm?

This can probably be achieved by wrapping components that have explicit/declarative execution semantics. Its an open question as to whether these execution models are a function of the component or the framework that is used to compose the components into an application though.

* Support for novel data layouts like space-filling curves?

I don't understand enough about such techniques to know how to approach this. However, it does point out that it is essential that we hand off data structures via accessors that keep the internal data structures opaque rather than complex data structures.

* Are there special considerations for collaborative applications?

* What else?

Ugh. I'm also hoping that collaborative applications only impose requirements for wrapping baseline components rather than imposing internal requirements on the interfaces that exchange data between the components. So I hope we can have "accessors" or "multiplexor/demultiplexor" objects that connect to essentially non-collaboration-aware components in order support such things. Otherwise, I'm a bit daunted by the requirements imposed.

How will the execution model affect our implementation of data structures?

* how do you decompose a data structure such that it is amenable to streaming in small chunks?

The recent SDM workshop pointed out that chunking/streaming interfaces are going to be essential for any data analysis system that deals with large data, but there was very little agreement on how the chunking should be expressed. The chunking also potentially involves end-to-end requirements of the components that are assembled in a pipeline as you must somehow support uniformity in the passage of chunks through the system (ie. the decision you make about the size of one chunk will impose requirements for all other dependent streaming interfaces in the system). We will need to walk through at least one use-case for chunking/streaming to get an idea of what the constraints are here. It may be too tough an issue to tackle in this first meeting though.

* how do you represent temporal dependencies in that model?

Each item in a datastructures needs to have some method of referring to dependencies both spatial (ie. interior boundaries caused by domain decomposition) and temporal. Its important to make these dependencies explicit in the data structures provide a framework the necessary information to organize parallelism in both the pipeline and data-parallel directions. The implementation details of how to do so are not well formulated and perhaps out-of-scope for our discussions. So this is a desired *requirement* that doesn't have a concrete implementation or design pattern involved.

* how do you minimize recomputation in order to regenerate data for view-dependent algorithms.

I don't know. I'm hoping someone else responding to this survey has some ideas on this. I'm uncertain how it will affect our data model requirements.

What are the execution semantics necessary to implement these execution models?

* how does a component know when to compute new data? (what is the firing rule)

For declarative semantics, the firing rule is an explicit method call that is invoked externally. Hopefully such objects can be *wrapped* to encode semantics that are more automatic (ie. the module itself decides when to fire depending on input conditions), but initially it should be explicit.

* does coordination of the component execution require a central executive or can it be implemented using only rules that are local to a particular component.

It can eventually be implemented using local semantics, but intiially, we should design for explicit external control.

* how elegantly can execution models be supported by the proposed execution semantics? Are there some things, like loops or back-propagation of information that are difficult to implement using a particular execution semantics?

Its all futureware at this point. We want to first come up with clear rules for baseline component execution and then can come up with some higher level / automatic execution semantics that can be implemented by *wrapping* such components. The "wrapper" would then take responsibility for imposing higher-level automatic semantics.

How will security considerations affect the execution model?

I don't know. Please somebody tell me if this is going to be an issue. I don't have a handle on the *requirements* for security. But I do know that simply using a secure method to *launch* a component is considered insufficient by security people who would also require that connections between components be explicitly authenticated as well. Most vis systems assume secure launching (via SSH or GRAM) is sufficient. The question is perhaps whether security and authorization are a framework issue or a component issue. I am hoping that it is the former (the role of the framework that is used to compose the components).

3) Parallelism and load-balancing=================

Thus far, managing parallelism in visualization systems has been a tedious and difficult at best. Part of this is a lack of powerful abstractions for managing data-parallelism, load-balancing and component control.

If we are going to address inter-component data transfers to the exclusion of data structures/models internal to the component, then much of this section is moot. The only question is how to properly represent data-parallel-to-data-parallel transfers and also the semantics for expressing temporal/pipeline parallelism and streaming semantics. Load-balancing becomes an issue that is out-of-scope because it is effectively something that is inside of components (and we don't want to look inside of the components).

Please describe the kinds of parallel execution models that must be supported by a visualization component architecture.

* data-parallel/dataflow pipelines?

Must

* master/slave work-queues?

Maybe: If we want to support progressive update or heterogeneous execution environments.

* streaming update for management of pipeline parallelism?

Must.

* chunking mechanisms where the number of chunks may be different from the number of CPU's employed to process those chunks?

Absolutely. Of course, this would possibly be implemented as a master/slave work-queue, but there are other methods.

* how should one manage parallelism for interactive scripting languages that have a single thread of control? (eg. I'm using a commandline language like IDL that interactively drives an arbitrarily large set of parallel resources. How can I make the parallel back-end available to a single-threaded interactive thread of control?)

I think the is very important and a growing field of inquiry for data analysis environments. Whatever agreements we come up with, I want to make sure that things like parallel R are not left out in these considerations.

Please describe your vision of what kinds of software support / programming design patterns are needed to better support parallelism and load balancing.

* What programming model should be employed to express parallelism. (UPC, MPI, SMP/OpenMP, custom sockets?)

If we are working just on the outside of components, this question should be moot. We must make sure the API is not affected by these choices though.

* Can you give some examples of frameworks or design patterns that you consider very promising for support of parallelism and load balancing. (ie. PNNL Global Arrays or Sandia's Zoltan)

http://www.cs.sandia.gov/Zoltan/

http://www.emsl.pnl.gov/docs/global/ga.html

Also out of scope. This would be something employed within a component, but if we are restricting discussions to what happens on the interface between components, then this is also a moot point. At minimum, it will be important to ensure that such options will not be precluded by our component interfaces.

* Should we use novel software abstractions for expressing parallelism or should the implementation of parallelism simply be an opaque property of the component? (ie. should there be an abstract messaging layer or not)

Yes.

* How does the NxM work fit in to all of this? Is it sufficiently differentiated from Zoltan's capabilities?

I need a more concrete understanding of MxN. I understand what it is supposed to do, but I'm not entirely sure what requirements it would impose on any given component interface implementation. It seems like something our component data interfaces should support, but perhaps such redistribution could be hidden inside of an MxN component? So should this kind of redistribution be supported by the inter-component interface or should there be components that explicitly effect such data redistributions? Jim... Help!

===============End of Mandatory Section (the rest is voluntary)=============

4) Graphics and Rendering=================

What do you use for converting geometry and data into images (the rendering-engine). Please comment on any/all of the following.

* Should we build modules around declarative/streaming methods for rendering geometry like OpenGL, Chromium and DirectX or should we move to higher-level representations for graphics offered by scene graphs?

This all depends on the scope of the framework. A-priori, you can consider the rendering method separable and render this question moot. However, this will make it quite difficult to provide very sophisticated support for progressive update, image-based-methods, and view-dependent algorithms because the rendering engine becomes intimately involved in such methods. I'm concerned that this is where the component model might break down a bit. Certainly the rendering component of traditional component-like systems like AVS or NAG Explorer the most heavy-weight and complex components of the entire environment. Often, the implementation of the rendering component would impose certain requirements on components that had to interact with it closely (particularly in the case of NAG/Iris Explorer where you were really directly exposed to the fact that the renderer was built atop of OpenInventor).

So, we probably cannot take on the issue of renderers quite yet, but we are eventually going to need to define a big "component box" around OpenGL/Chromium/DirectX. That box is going to have to be carefully built so as to keep from precluding any important functionality that each of those rendering engines can offer. Again, I wonder if we would need to consider scene graphs if only to offer a persistent datastructure to hand-off to such an opaque rendering engine. This isn't necessarily a good thing.

What are the pitfalls of building our component architecture around scene graphs?

It will add greatly to the complexity of this system. It also may get in the way of novel rendering methods like Image-based methods.

* What about Postscript, PDF and other scale-free output methods for publication quality graphics? Are pixmaps sufficient?

Pixmaps are insufficient. Our data analysis infrastructure has been moving rapidly away from scale-free methods and rapidly towards pixel-based methods. I don't know how to stop this slide or if we are poised to address this issue as we look at this component model.

In a distributed environment, we need to create a rendering subsystem that can flexibly switch between drawing to a client application by sending images, sending geometry, or sending geometry fragments (image-based rendering)? How do we do that?

* Please describe some rendering models that you would like to see supported (ie. view-dependent update, progressive update) and how they would adjust dynamically do changing objective functions (optimize for fastest framerate, or fastest update on geometry change, or varying workloads and resource constraints).

I see this as the role for the framework. It also points to the need to have performance models and performance monitoring built in to every component so that the framework has sufficient information to make effective pipeline deployment decisions in response to performance constraints. It also points to the fact that at some level in this component architecture, component placement decisions must be entirely abstract (but such a capability is futureware).

So in the short-term its important to design components with effective interfaces for collecting performance data and representing either analytic or historical-based models of that data. This is a necessary baseline to get to the point that a framework could use such data to make intelligent deployment/configuration decisions for a distributed visualization system.

* Are there any good examples of such a system?

No.

What is the role of non-polygonal methods for rendering (ie. shaders)?

* Are you using any of the latest gaming features of commodity cards in your visualization systems today?

* Do you see this changing in the future? (how?)

I'd like to know if anyone is using shader hardware. I don't know much about it myself, but it points out that we need to plan for non-polygon-based visualization methods. Its not clear to me how to approach this yet.

5) Presentation=========================

It will be necessary to separate the visualization back-end from the presentation interface. For instance, you may want to have the same back-end driven by entirely different control-panels/GUIs and displayed in different display devices (a CAVE vs. a desktop machine). Such separation is also useful when you want to provide different implementations of the user-interface depending on the targeted user community. For instance, visualization experts might desire a dataflow-like interface for composing visualization workflows whereas a scientists might desire a domain-specific dash-board like interface that implements a specific workflow. Both users should be able to share the same back-end components and implementation even though the user interface differs considerably.

How do different presentation devices affect the component model?

* Do different display devices require completely different user interface paradigms? If so, then we must define a clear separation between the GUI description and the components performing the back-end computations. If not, then is there a common language to describe user interfaces that can be used across platforms?

Systems that attempt to use the same GUI paradigm across different presentation media have always been terrible in my opinion. I strongly believe that each presentation medium requires a GUI design that is specific to that particular medium. This imposes a strong requirement that our compute pipeline for a given component architecture be strictly separated from the GUI that controls the parameters and presents the visual output of that pipeline. OGSA/WSDL has been proposed as one way to define that interface, but it is extremely complex to use. One could use CCA to represent the GUI handles, but that might be equally complex. Others have simply customized ways to use XML descriptions of their external GUI interface handles for their components. The latter seems much simpler to deal with, but is it general enough?

* Do different display modalities require completely different component/algorithm implementations for the back-end compute engine? (what do we do about that??)

I think there is a lot of opportunity to share the back-end compute engines across different display modalities. There are some cases where a developer would be inclined to implement things like an isosurfacer differently for a CAVE environment just to keep the framerates up high-enought to maintain your sense of immersion. However, I think of those as edge-cases.

What Presentation modalities do you feel are important and what do you consider the most important.

* Desktop graphics (native applications on Windows, on Macs)

* Graphics access via Virtual Machines like Java?

* CAVEs, Immersadesks, and other VR devices

* Ultra-high-res/Tiled display devices?

#3 : the next tiled display may well be your next *desktop* display, but quite yet.

* Web-based applications?

#2: If only because this is becoming an increasingly important component of collaboratorys.

What abstractions do you think should be employed to separate the presentation interface from the back-end compute engine?

* Should we be using CCA to define the communication between GUI and compute engine or should we be using software infrastructure that was designed specifically for that space? (ie. WSDL, OGSA, or CORBA?)

I think I addressed this earlier. We can do this all in CCA, but is that the right thing to do? I know this is an implementation issue, but is a strong part of our agreement on methods to implement our components (or define component boundaries).

* How do such control interfaces work with parallel applications? Should the parallel application have a single process that manages the control interface and broadcasts to all nodes or should the control interface treat all application processes within a given component as peers?

This requires more discussion, but reliable broadcast methods have many problems related to event skewing and MPI-like point-to-point emulation of the broadcast suffers from scalability problems. We need to collect design patterns for the control interface and either compete them against one-another or find a way to support them all by design. This is clearly an implementation issue, but will leak in to our abstract component design decisions. Clearly we want a single thread of control to efficiently deliver events to massively parallel back-end components. That is a *must* requirement.

6) Basic Deployment/Development Environment Issues============

One of the goals of the distributed visualization architecture is seamless operation on the Grid -- distributed/heterogeneous collections of machines. However, it is quite difficult to realize such a vision without some consideration of deployment/portability issues. This question also touches on issues related to the development environment and what kinds of development methods should be supported.

What languages do you use for core vis algorithms and frameworks.

* for the numerically intensive parts of vis algorithms

Fortan/C/C++

* for the glue that connects your vis algorithms together into an application?

C++/C/Java but I want to get into some Python.

* How aggressively do you use language-specific features like C++ templates?

I avoid them due to portability and compiler maturity issues.

* is Fortran important to you? Is it important that a framework support it seamlessly?

Yes, absolutely. It needn't be full fledged F90 support, but certainly f77 with some f90 extensions.

* Do you see other languages becoming important for visualization (ie. Python, UPC, or even BASIC?)

Python.

What platforms are used for data analysis/visualization?

* What do you and your target users depend on to display results? (ie. Windows, Linux, SGI, Sun etc..)

Linux, MacOS-X(BSD), Windows.

* What kinds of presentation devices are employed (desktops, portables, handhelds, CAVEs, Access Grids, WebPages/Collaboratories) and what is their relative importance to active users.

Desktop and laptops are most important. Web, AG, and CAVE are of lesser importance (but still important).

* Do you see other up-and-coming visualization platforms in the future?

Tablet PCs and desktop-scale Tiled display devices.

Tell us how you deal with the issue of versioning and library dependencies for software deployment.

* For source code distributions, do you bundle builds of all related libraries with each software release (ie. bundle HDF5 and FLTK source with each release).

Every time I fail to bundle dependent libraries, it has been a disaster. So it seems that packaging dependent libraries with any software release is a *must*.

* What methods are employed to support platform independent builds (cmake, imake, autoconf). What are the benefits and problems with this approach.

I depend on conditional statements in gmake-based makefiles to auto-select between flags for different architectures. This is not sufficiently sophisticated for most release engineering though. I have dabbled with autoconf, but it is not a silver bullet (neither was imake). I do not understand the practical benefits of 'cmake'.

* For binaries, have you have issues with different versions of libraries (ie. GLIBC problems on Linux and different JVM implemetnations/version for Java). Can you tell us about any sophisticated packaging methods that address some of these problems (RPM need not apply)

Building statically has been necessary in a lot of cases, but creates gigantic executables. In the case of JVM's, the problems with the ever-changing Java platform have driven me away from employing Java as a development platform.

* How do you handle multiplatform builds?

* Conservative, lowest-common denominator coding practices.

* execute 'uname' at the top of a gnu makefile to select an appropriate set of build options for sourcecode building. Inside of the code, must use the CPP to code around platoform dependencies.

How do you (or would you) provide abstractions that hide the locality of various components of your visualization/data analysis application?

* Does anyone have ample experience with CORBA, OGSA, DCOM, .NET, RPC? Please comment on advantages/problems of these technologies.

Nope.

* Do web/grid services come into play here?

As these web-based scientific collaboratory efforts gather momentum, web-based data analysis tools have become increasingly important. I think the motivation is largely driven by deployment issues when supporting a very heterogeneous/multi-institutional user base. It reduces the deployment variables when your target is a specific web-server environment, but you pay a price in that the user-interface is considerably less advanced. This cost is mitigated somewhat if the data analysis performed is very domain-specific and customized for the particular collaboratory community. So its a poor choice for general-purpose visualization tools, but if the workflow is well-established among the collaborators, then the weakness of the web-based user-interface options is not as much of a problem.

7) Collaboration ==========================

If you are interested in "collaborative appllications" please define the term "collaborative". Perhaps provide examples of collaborative application paradigms.

Is collaboration a feature that exists at an application level or are there key requirements for collaborative applications that necessitate component-level support?

* Should collaborative infrastructure be incorporated as a core feature of very component?

No. I hope that support for collaborative applications can be provided via supplemental components.

* Can any conceivable collaborative requirement be satisfied using a separate set of modules that specifically manage distribution of events and data in collaborative applications?

That is what I hope.

* How is the collaborative application presented? Does the application only need to be collaborative sometimes?

This is probably true. You probably want to be able to have tools that were effectively standalone that can join into a collaborative space on demand.

* Where does performance come in to play? Does the visualization system or underlying libraries need to be performance-aware? (i.e. I'm doing a given task and I need a framerate of X for it to be useful using my current compute resources), network aware (i.e. the system is starving for data and must respond by adding an alternate stream or redeploying the pipeline). Are these considerations implemented at the component level, framework level, or are they entirely out-of-scope for our consideration?

Yes. The whole collaboration experience will fall apart if you cannot impose some constraints on quality of service or react appropriately to service limitations. Its a big problem, but I hope the solution does not need to be a fundamental feature of the baseline component design.