NAME

masterslave - distributive batch processing via a master/slave paradigm.


SYNOPSIS

masterslave -r rdbfilename [ -d directory ] [-m resubmit ] [-h] [-v] [-w] [-x]

masterslave -c batchid

masterslave -K batchid

masterslave -J batchid

masterslave -k batchid host1 host2 ...

masterslave -j batchid host1 host2 ...


OPTIONS

The options are :

-c batchid

Perform database cleanups for the batch of jobs indicated by batchid. This is only to be used in conjunction with the -w option.

-d directory

The directory where the error message file and temporary files generated by this script are to be written. The default is the directory where masterslave is run.

-k batchid

Suspend the jobs with the given batchid on the specified hosts (which should be on the command line, see SYNOPSIS). Current jobs will finish, but no new jobs will be started.

-j batchid

Restart the jobs with the given batchid on the specified hosts (which should be on the command line, see SYNOPSIS).

-K batchid

Suspend the jobs with the given batchid. Current jobs will finish, but no new jobs will be started.

-J batchid

Restart the jobs with the given batchid which previously had been suspended with the -K command.

-h

Print this help information.

-m max_number_to_resubmit_failed_job

The option -m controls the maximum number of times a failed job is to be resubmitted.

-r rdbfilename

The rdbfilename is an rdb file which contains information about the jobs to run. See the description below.

-v

Verbose option. Programs will print diagnostic messages.

-w

Wait for application to complete before returning prompt. If the -w option is not entered by the user then the user must manually perform some cleanup after the jobs are completed. This is done by entering the command

     masterslave -c batchid

where batchid is that issued by masterslave for the submitted jobs. masterslave will remind you of this after it starts the jobs.

-x

Start a client/server package that emulates hardware LEDs.


DESCRIPTION

masterslave is the master script for batch managing processing suite of programs. The script populates the batchjob table of the Postgres masterslave database, selects the workstations from the class table, runs the Message Passing Interface (MPI) based program on a Local Area Multicomputer (LAM). The error messages from the programs executed by slaves are written out to files jobid.stderr, where jobid is the user's supplied identification for one of the row entries of the job to be run.

The user must supply an rdb table to run the masterslave script. The required rdtable must contain four columns:

id

a unique identification label (user generated) for each row of the table.

cmd

the command to be submitted.

class

the type of workstation which the user wishes to run the job on. It may be a single type, or a comma separated list of types. Each type must be one of the entries in the class table in the masterslave database. It may also be any to indicate that any machine is acceptable.

opt

this contains an optional entry which is not currently being used.

The commands executed by the slaves are submitted from the user's home directory. Therefore any output files generated by the program or script executed by the slaves are written out to the user's home directory. For this reason, the user may want to write a wrapper script to change directory before executing the desired command. Alternatively, the user can enter as one command which contains two instructions seperated by a semicolon (;) in the cmd column in the rdb file. The two instructions are 1) change to directory where the temporary files are to be written out, terminated by a semicolon. 2) The command to execute program or script. See the example below for more details.

A lamhost file lamhost.seq, where seq is the unique sequence number generated by the masterslave script, in the directory specified by the -d option contains the names of the workstations where the jobs are to be run on. The table lamhost in the masterslave database contains the information of all workstations which are being used for each job submitted by the masterslave script.


ENVIRONMENTAL VARIABLES

The environment variable LAMHOME must be defined to be /proj/axaf/pkgs/lam.


EXAMPLE

  dumbo-266: head -15 process96_ultra2.rdb | tbl2lst

  #
  # User must provide an /rdb table with four columns: id, cmd, class, opt
  #
  # The column id contains a tag to identify  the job to be executed.
  # The column cmd contains the Unix command to run the program or script.
  # The column class contains the class of workstations to run the job
  # on, any is an acceptable option for this column.
  # The column opt is not currently being used

             id | S
            cmd | S
          class | S
            opt | S

             id | item0
            cmd | cd /data/dumbo1/dtn/simul/src/c/masterslave;
                  testscript.ksh item0 foo
          class | any
            opt | item0

             id | item1
            cmd | cd /data/dumbo1/dtn/simul/src/c/masterslave;
                  testscript.ksh item1 foo
          class | ultra2
            opt | item1

             id | item2
            cmd | cd /data/dumbo1/dtn/simul/src/c/masterslave;
                  testscript.ksh item2 foo
          class | ultra2
            opt | item2

             id | item3
            cmd | cd /data/dumbo1/dtn/simul/src/c/masterslave;
                  testscript.ksh item3 foo
          class | ultra2
            opt | item3

  dumbo-267: masterslave -r process96_ultra2.rdb -w