dvegas is hosted by Hepforge, IPPP Durham


Chapter 3: OmniComp

OmniComp is an intuitive system that is easy to install and use which allows to accelerate Monte Carlo programs through distributed execution on workstation clusters or PC farms as well as multi- processor machines. It is based on omniORB, a high-performance, open-source CORBA implementation from AT&T Laboratories Cambridge. Notably, it does not require thread-safe integrand implementations.

3.1: Using OmniComp

Usage examples:

host3> ssh -n host1 /path1/omnicomp_prog -w &
host3> ssh -n host2 /path2/omnicomp_prog -w &
host3> ssh -n host2 /path2/omnicomp_prog -w &
host3> /path3/omnicomp_prog
master process started (includes 1 worker)
3 additional worker(s) found
work done so far ...
0%  0%  0%  0%

Here, host1 and the local host are assumed to be single-processor machines, while host2 is assumed to be a dual-processor system. The working directory with the executable file omnicomp_prog is assumed to be shared between host1, host2 and the local host. This is convenient but not necessary (see -n option below). If ssh asks for a password on the command line try ssh -f instead of ssh -n.

Since the master process contains its own worker one can also start just one instance of the executable with no options and it will run like a regular, non-distributed program:

host> /path/omnicomp_prog
master process started (includes 1 worker)
no additional workers found
work done so far ...
0%

If different executables are required on different hosts, for example because they run different operating systems, the executables can be distinguished with a ".<key>" postfix. One needs to use the -d option in this case. It causes the postfix to be disregarded when the *.workers file name is determined:

host3> ssh -n host1 /path1/omnicomp_prog.os1 -w -d &
host3> ssh -n host2 /path2/omnicomp_prog.os2 -w -d &
host3> ssh -n host2 /path2/omnicomp_prog.os2 -w -d &
host3> /path3/omnicomp_prog.os3 -d

One can disable the collocated worker in the local master process with the -m option. In this mode the master process controls the other workers, monitors progress and collects results, but does not participate in the computation itself.

If no shared directory mounted on all machines is available the distributed computation can be bootstrapped using the naming service (-n option). After setting up the environment for naming service as described in section 3.2.2, the server needs to be started on the host specified in omniORB.cfg by executing omniNames. Then, for example for tcsh:

host4> ssh -n host1 "tcsh -c '/path1/omnicomp_prog -w -n'" &
host4> ssh -n host2 "tcsh -c /path2/omnicomp_prog -w -n'" &
host4> ssh -n host3 "tcsh -c /path3/omnicomp_prog -w -n'" &
host4> /path4/omnicomp_prog -n

If your remote shell account is not set up to access the naming service you can also include the information on the command line:

host4> ssh -n host1 /path1/omnicomp_prog -w -n
       -ORBInitRef NameService=corbaname::names.example.edu &
host4> ssh -n host2 /path2/omnicomp_prog -w -n
       -ORBInitRef NameService=corbaname::names.example.edu &
host4> ssh -n host3 /path3/omnicomp_prog -w -n
       -ORBInitRef NameService=corbaname::names.example.edu &
host4> /path4/omnicomp_prog -n

Here, names.example.edu is the hostname of the system that provides the naming service.

It is important to use the same set of omnicomp options (except for -w and -m) in all commands else errors will likely occur. If that happens the naming service can be cleaned up with nameclt, an omniORB client program to inspect and modify the naming service registry.

3.2: Getting started with OmniComp

3.2.1: omniORB headers and libraries

In order to build OmniComp executables one needs to link with omniORB libraries. These have been pre-built for a number of common platforms including Intel/Linux and can be downloaded for free at
http://www.uk.research.att.com/omniORB/omniORBForm.html

If no pre-built libraries are available for your platform they can easily be built from source with a few steps:

  1. Find the configuration that best matches your platform (and compiler!) in ./mk/platforms/ and uncomment the corresponding line in ./config/config.mk, for example: platform = alpha_osf1_5.0
  2. In the selected configuration file in ./mk/platforms/ edit the line that sets PYTHON and insert the path to your python interpreter, e.g. /usr/local/bin/python. If Python 1.5.2 or higher is not installed on your system follow the instructions in the file.
  3. change to ./src and make export (ignore the warnings) This step requires about 90MB and takes about 40 minutes on a PentiumII/333MHz.
  4. make clean
You're done. The built libraries and binaries consume about 20MB disk space.

3.2.2: Environment variables

Example commands for shell initialization (assuming tcsh and Linux):
# omniorb libraries and executables 
setenv OMNIORB_TOPDIR ${HOME}/omniorb/omni
setenv LD_LIBRARY_PATH ${LD_LIBRARY_PATH}:${OMNIORB_TOPDIR}/lib/i586_linux_2.0_glibc2.1
setenv PATH ${PATH}:${OMNIORB_TOPDIR}/bin/i586_linux_2.0_glibc2.1

If your omnicomp programs will be located in a shared directory that is mounted on all computers no further steps are necessary.

Otherwise the computation has to be bootstrapped with omniNames (option -n) and one also needs:

# omniorb naming service (omniNames)
setenv OMNINAMES_LOGDIR ${HOME}/omniorb/names_log
setenv OMNIORB_CONFIG ${HOME}/omniorb/omniORB.cfg

OMNINAMES_LOGDIR specifies the log directory for omniNames and is only required in the shell that is used to start omniNames. The log directory and files are created when omniNames is started for the first time.

File omniORB.cfg indicates on which computer omniNames will be run. Example with default port number:

ORBInitialHost names.example.edu
ORBInitialPort 2809