Machine Learning & Analytics Group Software

Jump to Software Frameworks, I/O Libraries, Visualization Tools, Image Analysis, Miscellaneous

Software Frameworks

BASTet

BASTet is a novel framework for shareable and reproducible data analysis that supports standardized data and analysis interfaces, integrated data storage, data provenance, workflow management, and a broad set of integrated tools. BASTet has been motivated by the critical need to enable MSI researchers to share, reuse, reproduce, validate, interpret, and apply common and new analysis methods.

DAGR

DAGR is a scalable framework for implementing analysis pipelines using parallel design patterns. DAGR abstracts the pipeline concept into a state machine composed of connected algorithmic units. Each algorithmic unit is written to do a single task resulting in highly modularized, reusable code. DAGR provides infrastructure for control, communication, and parallelism, you provide the kernels to implement your analyses. Written in modern C++ and designed to leverage MPI+threading for parallelism, DAGR can leverage the latest HPC hardware including many-core architectures and GPUs. The framework supports a number of parallel design patterns including distributed data, map-reduce, and task based parallelism.

DIY

DIY is a block-parallel library for implementing scalable algorithms that can execute both in-core and out-of-core. The same program can be executed with one or more threads per MPI process, seamlessly combining distributed-memory message passing with shared-memory thread parallelism. The abstraction enabling these capabilities is block parallelism; blocks and their message queues are mapped onto processing elements (MPI processes or threads) and are migrated between memory and storage by the DIY runtime. Complex communication patterns, including neighbor exchange, merge reduction, swap reduction, and all-to-all exchange, are possible in- and out-of-core in DIY.

Henson

Henson uses coroutines and position-independent executables to enable cooperative multitasking between simulation and analysis, allowing the same executables to post-process simulation output, as well as to process it on the fly, both in situ and in transit. Our design differs significantly from the existing frameworks and offers an efficient and robust approach to integrating multiple codes on modern supercomputers.

SENSEI Generic Data Interface

The SENSEI generic data interface provides a framework for science code teams and analysis algorithm developers to write code once and use it anywhere within the four major in situ analysis frameworks (ADIOS, GLEAN, ParaView/Catalyst and VisIt/libsim). Furthermore, since ParaView/Catalyst and VisIt/Libsim both are treated as analysis routines under SENSEI, these visualizations can be run in situ, or in transit using ADIOS or GLEAN transparently.

Toolkit for Extreme Climate Analysis (TECA)

TECA is a collection of climate analysis algorithms geared toward extreme event detection and tracking implemented in a scalable parallel framework. The core is written in modern C++ and uses MPI+thread for parallelism. The framework supports a number of parallel design patterns including distributed data parallelism and map-reduce. Python bindings make the high performance c++ code easy to use. TECA has been used up to 750k cores.

I/O Libraries

BrainFormat

The LBNL BrainFormat library specifies a general data format standardization framework and implements a novel file format for management and storage of neuro-science data. The library provides a number of core modules that can be used for implementation and specification of scientific application formats in general. Based on these components, the library implements the LBNL BRAIN file format.

H5hut, H5part

H5hut (formerly H5Part) is a very simple data storage schema and provides an API that simplifies the reading/writing of the data to the HDF5 file format. H5Part is built on top of the HDF5 (Hierarchical Data Format).

PyNWB

Neurodata Without Borders: Neurophysiology (NWB:N) is more than just a file format but it defines an ecosystem of tools, methods, and standards for storing, sharing, and analyzing complex neurophysiology data. PyNWB is a Python package for working with NWB:N files. It provides a high-level API for efficiently working with Neurodata stored in the NWB:N format. Beyond neurophysiology, PyNWB provides a general set of tools for hierarchical organization of data for creation of complex data standards.

Visualization Tools

Brain Modulyzer

Brain Modulyzer is an interactive visual exploration tool for functional magnetic resonance imaging (fMRI) brain scans, aimed at analyzing the correlation between different brain regions when resting or when performing mental tasks. Integrated methods from graph theory and analysis, such as community detection and derived graph measures, make it possible to explore the modular and hierarchical organization of functional brain networks.

ECoG ClusterFlow

ECoG ClusterFlow is an interactive visual analysis tool for the exploration of high-resolution Electrocorticography (ECoG) data. ECoG Clusterflow detects and visualizes dynamic high-level structures, such as communities, using the time-varying spatial connectivity network derived from high-resolution ECoG data. ECoG ClusterFlow makes it possible 1) to compare the spatio-temporal evolution patterns for continuous and discontinuous time-frames, 2) to aggregate data to compare and contrast temporal information at varying levels of granularity, 3) to investigate the evolution of spatial patterns without occluding the spatial context information.

OpenMSI

OpenMSI is an advanced web-based gateway for management and storage of MSI data, the visualization of the hyper-dimensional contents of the data, and the statistical analysis.

OpenMSI Arrayed Analysis Toolkit (OMAAT)

OpenMSI Arrayed Analysis Toolkit (OMAAT) is a new software package to analyze spatially defined samples in mass spectrometry imaging (MSI) using OpenMSI and Jupyter.

PointCloudXplore

PointCloudXplore is the first visualization system specifically developed for the analysis of 3D gene expression data. PointCloudXplore is available for Linux, Mac, and Windows. For more information about 3D gene expression data see also the webpage of the Berkeley Drospholia transcription Network Project.

Visapult

Visapult is a pipelined-parallel volume rendering application capable of rendering extremely large volume data on a wide range of common platforms. It was featured in a a paper in the SC 2000 Technical Program.

WarpIV

WarpIV is a python application that enables efficient, parallel visualization and analysis of simulation data while it is being generated by the Warp simulation framework. WarpIV integrates state-of-the-art in situ visualization and analysis using VisIt with Warp, supports management and control of complex in situ visualization and analysis workflows, and implements integrated analytics to facilitate query and feature-based data analytics and efficient large-scale data analysis.

semViewer

The semViewer software was developed as part of an LBNL LDRD project during the period 1999-2001. It is used to perform distance and angular measurements of perceived 3D objects present in pairs of images obtained from scanning electron microscopy.

Image Analysis

F3D

F3D is a Fiji plugin, designed for high-resolution 3D image, and written in OpenCL. F3D plugin achieves platform-portable parallelism on modern multi-core CPUs and many-core GPUs. The interface and mechanisms to access F3D accelerated kernes are written in Java to be fully integrated with other tools available within Fiji/ImageJ. F3D delivers several key image-processing algorithms necessary to remove artifacts from micro-tomography data. The algorithms consist of data parallel aware filters that can efficiently utilize resources and can process data out of core and scale efficiently across multiple accelerators. Optimized for data parallel filters, F3D streams data out of core to efficiently manage resources, such as memory, over complex execution sequence of filters. This has greatly expedited several scientific workflows dealing with high-resolution images. F3D preforms two main types of 3D image processing operations: non-linear filtering, such as bilateral and median filtering, and morphological operators (MM) with varying 3D structuring elements.

FibriPy

Materials characterization using different imaging modalities, such as micro computed tomography (microCT), scanning electron microscopy (SEM) and scanning transmission electron microscopy (STEM) tomography, has enabled the development of advanced composites that will open up new opportunities to improve manufacturing. In order to deploy the targeted material, several structures must be detected and tracked during the experiment under varying parameters conditions. FibriPy is a software framework that provides tools for the detection and analysis of these structures through the recognition of key patterns from the images. FibriPy combines user-friendly dashboards with programmable functions to support the analysis automation of 3D image stacks. As scalability is one of the main issues in analyzing high-resolution images, FibriPy delivers process-based concurrency and scalable GPU-based visualization, with portability benefits brought by well-established Python packages. FibriPy provides tools for image enhancement, automatic detection of fiber cross-sections, interactive tools to improve fiber detection, automatic fiber tracking, and 2D and 3D visualization. The main characteristics of FibriPy are: (a) advanced algorithms for image analysis and feature extraction, such as: structure-based image enhancement using nonlinear filtering, adaptive feature-based matching and learning, and motion tracking based on cross-correlation, (b) Python-centric multi-threading software architecture, and (c) GPU-accelerated visualization tools.

MSM-CAM

An important component of the 3D micro-CT pipeline is image partitioning (or image segmentation), a step that is used to separate various phases or components in an image. Image partitioning schemes require specific rules for different scientific fields, but a common strategy consists of devising metrics to quantify performance and accuracy. Our method proposes a set of protocols to systematically analyze and compare the results of unsupervised classification methods used for segmentation of synchrotron-based data. The proposed dataflow for Materials Segmentation & Metrics (MSM) provides 3D micro-tomography image segmentation algorithms, such as statistical region merging (SRM), k-means algorithm and parallel Markov random field (PMRF), while offering different metrics to evaluate segmentation quality, confidence and conformity with standards. Both experimental and synthetic data are assessed, illustrating quantitative results through the MSM dashboard, which can return sample information such as media porosity and permeability. The main contributions of this work are: (i) to deliver tools to improve material design and quality control; (ii) to provide datasets for benchmarking and reproducibility; (iii) to yield good practices in the absence of standards or ground-truth for ceramic composite analysis.

PMRF-IS

This software performs multi-label segmentation of 2D/3D images. It is highly accurate, results rely on use of an Markov random field (MRF) formulation. It runs in both shared and distributed memory parallel modes. MRF algorithms are powerful tools in image analysis to explore contextual information of data. However, the application of these methods to large data means that alternative approaches must be found to circumvent the NP-hard complexity of the MRF optimization. PMRF-IS overcomes this issue by using graph partitioning. The computational complexity is decreased considerably as the optimization/parameter estimation are executed on small subgraphs. PMRF-IS is a C++ implementation of this new optimization approach. It includes different parallel implementations that can be used: C++11 threads, MPI and OpenMP.

SHARP

Sharp Camera is used to obtain images from ptychography experiments in real-time. It uses parallelization and GPU computing to achieve the necessary speed. It is designed to reconstruct an image from a set of far-field diffraction patterns, recorded at known sample translations, obtained in an X-ray ptychography experiment. For the further information, please contact the Sharp development team (sharp-access@lists.lbl.gov)

Xi-CAM

A versatile interface for visualization and data analysis providing workflow for local and remote computing, data management, and seamless integration of plugins. Xi-cam is a continuing development project in an early beta stage. If interested in collaborative development or to receive development beta releases, please contact Ron Pandolfi (ronpandolfi@lbl.gov) and Alex Hexemer (ahexemer@lbl.gov).

Miscellaneous

zorder-lib

zorder-lib is a C-language library callable from C or C++ programs that provides a simple-to-use API for implementing an altnerative to traditional row-major order in-memory layout, one based on a Morton-order space-filling curve (SFC), specifically, a Z-order variant of the Morton order curve. The library enables programmers to, after a simple initialization step, to convert a multidimensional array from row-major to Z-order layouts, then use a single, generic API call to access data from any arbitrary (i,j,k) location from within the array, whether it it be stored in row-major or z-order format.