Software of Gene Cooperman

Some of my software packages follow.

DMTCP (transparent checkpoint/restart)

TOP-C (Task Oriented Parallel C/C++)

ParGAP (Parallel GAP)

ParGCL (Parallel GNU Common LISP)

MPINU (MPI subset)

Implicit State Space Search (Eric Robinson)

Thread-Parallelism for Geant4 (Xin Dong)

Parallelization of Geant4

Subgroup Chains for GAP (Computational Algebra)

Cayley Graphs and Rubik's cube

Application-specific Parallel Software

(The author wishes to thank the National Science Foundation for support for much of this work.)

DMTCP (Distributed MultiThreaded CheckPointing)

DMTCP is distributed via the DMTCP sourceforge site. In addition to downloading the latest stable release, you can grab the very latest code from svn via:

 
  svn co https://dmtcp.svn.sourceforge.net/svnroot/dmtcp dmtcp

This is a collaboration among many students and myself. Some of the contributors include: Jason Ansel, Kapil Arya, Michael Rieker, Artem Polyakov, Preveen Solanki, Ana Maria Visan, and myself. It works as simply as:

  dmtcp_checkpoint a.out           # Run a.out with the DMTCP checkpoint library
  dmtcp_command --checkpoint       # Create a checkpoint image
  dmtcp_restart ckpt_a.out_*.dmtcp # Restart from checkpoint image

The software transparently tracks spawning of threads, forking of child processes, and using ssh to start remote processes. All parts of the distributed computation (all threads and processes) are checkpointed, with one checkpoint image per process. Many more options are offered, and can be found in the documentation.

TOP-C (Task Oriented Parallel C)

TOP-C also has a home page with this and further information.

TOP-C has three fundamental concepts: the task, the public shared data, and the action chosen after a task is completed. Communication between processes occurs only through these three mechanisms. The design goals of TOP-C are that it should follow a simple, easy-to-use programmer's model, and that it should have high latency tolerance. This allows for economical parallel computing on commodity hardware. The simple parallel model turned out to be surprisingly adaptable to parallelizing legacy software, as was demonstrated in the parallelization of a 1,000,000 line C++ program, Geant4, for simulation of high energy particle showers. (see description below)

TOP-C is invoked by linking the application program with a TOP-C library. You can choose to link with a library customized for your hardware architecture: distributed memory, sequential (for debugging), and (experimental version) shared memory. Because the application code remains identical in all cases, it is planned to also provide a TOP-C library for a hierarchy of distributed memory over shared memory, as one might see in a Beowulf cluster of PC's, each with dual processors. A TOP-C application is built around a single system call: TOPC_master_slave(). At the time that TOPC_master_slave() is called, it reads a file, procgroup, from the local directory, specifying which slave machines to start. Typical applications have the slave process execute the same binary as the master, thus running in SPMD mode. The TOPC_master_slave() system call takes four arguments consisting of four application-defined procedures:

generate_task_input() -> input,
do_task(input) -> output,
check_task_result(input, output) -> action,
update_shared_data(input, output).

An action consists of one of NO_ACTION, REDO, UPDATE_ENVIRONMENT, or CONTINUATION(new_input).

TOP-C version 2.x is now available. You can find it here. To find out more about the TOP-C model, a good place to start is the README file and example.c (an sample TOP-C application).

ParGAP (Parallel GAP)

GAP (Groups, Algorithms, and Programming) is a general system for computational group theory and certain other applications in computational algebra. It supports the TOP-C model of parallelism in an interactive environment. ParGAP (or ParGAP/MPI) is a refereed share package that is distributed at the same site where GAP is distributed. Further information about the ParGAP software and how it can be downloaded can be found here.

Certain interesting issues of user interface became immediately apparent in this project. These issues had to be addressed before the package could be refereed as part of GAP's formal process for refereeing share packages. For example, when the user types ^C (interrupt) in a parallel application, which processes should be interrupted? What if the slave process goes into an infinite loop? Which UNIX signals should be ignored, or have a special handler? After an interrupt, what should happen to pending messages in the network buffer or in transit in the network? Since I decided to expose the underlying MPI calls by providing bindings to GAP commands that could be called interactively, what arguments should be defaulted. Resolving these issues while doing the least damage to a large software package originally designed for sequential computation was an interesting exercise.

ParGCL (Parallel GNU Common LISP)

This was the first effort at a parallel abstraction. It provides an implementation of the TOP-C model in LISP. If has recently been renovated. It can now be easily installed (./configure; make). The software (and an older paper describing it) exist in my pargcl directory. Further information about it is available at the ParGCL home page.

MPINU (MPI subset)

This is an MPI subset implementation. MPI ( Message Passing Interface) is a standard for passing messages, used especially by the parallel computing community. The benefits of this implementation are that it has a small footprint, it can be linked in a straightforward manner (library archive and include files), and that it does not call malloc() after initialization. All of these features were dictated by the desire to use a standard message calling protocol, MPI, in the software above. In particular, the interactive software was likely to make successive calls to sbrk() and expect contiguous memory, so that the interactive software could expand its heap dynamically during a session. Mixing sbrk() and malloc() calls in this context quickly leads to problems. It was written in Fall, 1996, initially as an independent study by an undergraduate student, and then extended by me. It primarily includes the point-to-point layer of MPI. The primary fault of MPINU is that it continued to live long after the software should have died. But it was needed for TOP-C, ParGAP and Star/MPI.

Finally, in Fall, 1999, a graduate student, Predrag Petkovic, produced a better implementation for a course project. His implementation includes almost all of the layers of MPI (but not topology, for example), and is surprisingly professional for software done in such a short time. You can find both versions in mpinu directory.

Implicit state space search (Eric Robinson)

This work is by my former Ph.D. student Eric Robinson. Implicit state space search occurs when one has a directed labelled graph for an implicit state space. In computational group theory, this is a common occurence. For example, a group of matrices may act on a vector such that the image of the vector under a group element uniquely determines the matrix under that group element. Eric Robinson completed several enumerations in computational group theory that were the largest of their kind using this software. The software is available here.

Thread Parallelism for Geant4 (Xin Dong) ( Geant4 (GEometry ANd Tracking) is a simulation package for particle-matter interaction developed at CERN. Because Geant4 applications often require a gigabyte of RAM, a thread-parallel version of Geant4 with data sharing is important for future many-core CPUs. This work by my student, Xin Dong, automatically produces such a thread-parallel version. It does so by automatically modifying about 10,000 lines of C++ code (out of the million lines that make up Geant4). The software is still evolving, but this set of slides (or here) describes the approach. This talk was given at the session on "Multithreading : Discussion of prototype" at the 13th Geant4 Collaboration Kobe Workshop in Kobe, Japan on Oct. 6 - 8, 2008.

Parallelization of Geant4 ( Geant4 (GEometry ANd Tracking) is a simulation package for particle-matter interaction developed at CERN, the place where the Web was invented. Although initially developed to assist in collider design, its use has spread to many areas of physics, engineering and biomedicine. Including the associated toolkits, Geant4 contains approximately 1,000,000 lines of C++ code. The Geant4 software was developed by RD44, a world-wide collaboration of about 100 scientists participating in more than 10 collider experiments in Europe, Russia, Japan, Canada and the United States. Many specialized working groups are responsible for the various domains of the toolkit.

If one is designing an array of 1,000 detectors, it is better to test design tradeoffs in a software simulation before fixing the design. Parallel simulation allows one to more easily iterate the design process, since the results of a simulation are available many times sooner, allowing for more rapid design development.

I have parallelized this code using TOP-C. The parallelization appeared in Geant4 version 4.1. Read about the latest version here.

Subgroup Chains for GAP (Computational Algebra)

This is an extension to the GAP library. It is joint work with Scott Murray (currently a post-doc at U. Chicago). GAP (Groups, Algorithms and Programming) is a general purpose PASCAL-like or MAPLE-like programming language for mathematical groups and related algebraic constructs. It has a small C kernel to provide an interpreted language, while the algorithmic basis for GAP is contained in its "library". Two fundamental concepts that often recur in algorithms for computational group theory are (1) chains of subgroups (often used for divide-and-conquer style algorithms), and (2) shadowing elements of a group by elements of a smaller "homomorphic image" group. These algorithmic abstractions in the past were typically considered "algorithmic data structures" to be programmed by an applications programmer on top of a general purpose programming language. As of GAP 4.2, our code to implement these abstractions is now included in the central library. Our routines allow the application programmer to easily construct algorithm-specific chains of subgroups, possibly shadowed by smaller groups. Our routines then allow the application programmer to invoke built-in divide-and-conquer algorithms to compute the size of the groups in the chain, enumerate elements of the groups, find a uniformly random element, test membership in a subgroup for an arbitrary group element, etc. This frees the application programmer to implement the application in a manner that is closer to the abstract conception of the algorithm.

SCHEMENU

In teaching a programming languages course for Juniors and Seniors, I decided to have them implement the main parts of a working Scheme interpreter. I gave them three successively more sophisticated code bases and asked them to execute three assignments to extend the software. In this way, I made sure that no student would fall behind, since they would always inherit the full code base for the next assignment. In the three assignments, I had the students implement new dynamic data types, new built-in functions, ``lambda'' (the ability to define new functions in Scheme), stop-and-copy garbage collection, and a limited version of continuations). I wrote certain utilities for them, such as a function to load files, to help them in their development process. In one of my more enjoyable hacks, I also included an interface to GNU readline-4.0, allowing students to edit and re-execute a history of expressions, use <TAB> for auto-completion, and even edit a Scheme expression on the command line by evaluating a subexpressions in-line. The software is meant purely as pedagogical software, and its speed is abysmal. Nevertheless, if you are interested, you will find it in my ftp directory.

Cayley Graphs and Rubik's cube

I also have software for efficiently constructing Cayley graphs of groups using only 2-bits per state in the internal data structure representation. This software is 10 years old and well past its half-life. So, I do not include a pointer to it, but it is worth noting that this software was responsible for discovering the diameter of Rubik's 2 x 2 x 2 cube (corners only). The computations were carried out in 1989 on a 10 MHz SUN-3 workstation. The data structure used 1 MB RAM to represent the 3,674,160 states. And for the curious, one defines a basic move as a twist of one face by 90, 180, or 270 degrees, then there are 9 basic moves. (There are six faces, but a twist of one face is equivalent to an inverse twist of the opposite face.) Under this definition, any state of the 2 x 2 x 2 Rubik's cube can be solved in at most 11 moves.

Application-specific Parallel Software

Finally, I also have various application-specific software. Such software is typically written with the goal of solving a specific problem, and so it suffers from benign neglect. Nevertheless there are several examples where I carried out the largest parallel computation of its kind in a specific problem from computational algebra, and I hope to extend this section to describe those applications.

The Blue Ribbon Online Free Speech Campaign!

Gene Cooperman
Khoury College of Computer Sciences, 336 WVH
Northeastern University
Boston, MA 02115
e-mail: gene@ccs.neu.edu
Phone: (617) 373-8686
Fax: (617) 373-5121