NAME

divconquer - A (no frills) script to submit jobs on various workstations on the network.


SYNOPSIS

divconquer [-d stdoutstderrdir] [-v] [-n level] -s host1[,host2] -r job_db

divconquer [-d stdoutstderrdir] [-v] [-n level] -f host_db -r job_db

divconquer -h


OPTIONS

-d stdoutstderrdir
The option flag to set the directory to write the progarms output to pstdout stderr. The default is the current directory. Example:
        divconquer -d /pool1  -r tst.rdb -s pelf,ymir,marple

-n nicelevel
The option flag to set the nice level to run the program/script. nicelevel must be within [4,20]. The default nice level is 8. Example:
        divconquer -n 4 -r tst.rdb -s pelf,ymir,marple

-r cmdfilename.rdb
This option is required. This specifies the file containing the list of jobs to be run. See Specifying Jobs for more information.

-f hostfile
hostfile is a file containing a list of hosts on which to run jobs. There is one host per line; lines beginning with a # character are ignored. Multiple invocations of the same host are permitted; this will lead to multiple simultaneous jobs running on that host. See Specifying Hosts.

-s mach0,mach1,...,machN
The names of the host workstations on the network to which the work is to be submitted. The host names are separated either by commas or whitespace (in which case the entire string must be enclosed in quotes). Note that a machine may appear multiple times within the argument, the drawback is the machine will have multiple jobs running simultaneously on it. This option may be specified more than once. See Specifying Hosts.

-v
If specified, then the host machine shall print its progress. The default is to run silent, run deep.

-h
Print out help, and exit.


DESCRIPTION

divconquer is a simple script to submit jobs on various workstations on the network. The script divconquer is simple to use, as a result it does not have any code for load balancing. The script can only return once all the jobs are finished. The user must also specify the machines to be submitted. The script requires two arguments, the command file and the workstations to which to submit the jobs. The user must supply an RDB table containing the commands to be executed and the job ids.

Specifying Hosts

The hosts upon which to run are specified with either the -s or -f options. A host may be specified multiple times, causing divconquer to submit that many simultaneous jobs.

To minimize the idle time of the host workstations, it is recommended that the number of workstations be divisible by the number of jobs. For example, if there are 12 jobs executed, it is best to submit 2, 3, 4, 6 or 12 hosts. The script tests whether a host machine can be reached by pinging the machine. The user may want to test this manually.

For example, to see if marple can be one of the host workstations (yes).

  ennui-804: ping marple
  marple is alive

To see if dumbo1 can be one of the host workstations (no).

  ennui-803: ping dumbo1
  ping: unknown host dumbo1

Specifying Jobs

Jobs are listed in the file which is specified by the required -r flag. The file is an RDB file containing at least two columns (do not include comment in the file), id and cmd. The column id and cmd contains, respectively, the unique job id and the command to be executed on the machine. A sample command file is given below:

        ennui-771: cat tst.rdb
        id      cmd     junk    tmp
        --      ---     ----    ---
        job1    ps -aux now is  the
        job2    date    junk    a
        job3    who     a       a
        job4    ls      a       a

Note that multiple command entries, seperated by `;', may be entered in the cmd column.

Job output

The standard output and error streams for each job are written to files with the format id.stdout and id.stderr, where id is the job id as specified in the command file. The files id.stdout or id.stderr are removed if they are empty.

Examples

        1) A sample command file containing the commands to be executed:
        futile-1629: cat tst.rdb 
        id      cmd     junk    tmp
        --      ---     ----    ---
        job1    ps -aux now is  the
        job2    date    junk    a
        job3    who     a       a
        job4    ls      a       a
        ...     ..      .       .
        job31   who     a       a
        job32   ls      a       a
        2) To submit the jobs on pelf ymir and marpe, use the following
           command:
        futile-1630: ./divconquer -r tst.rdb -s pelf,ymir,marple
        32 files will be processed on 3 Machines.  11 records/machine
        [futile] submit 'nice -8 conquer  -r tmp_tst.0.rdb to pelf'
        [futile] submit 'nice -8 conquer  -r tmp_tst.1.rdb to ymir'
        [futile] submit 'nice -8 conquer  -r tmp_tst.2.rdb to marple'
        [futile]: children=6186 6188 6190
        3) The messages to stdout and stderr are trapped in the files:
        futile-1631: ls *.stdout *.stderr
        job1.stderr        job17.stderr ....
        job1.stdout        job17.stdout ....
        ............       .................
        job16.stderr       job23.stderr ....
        job16.stdout       job23.stdout ....
        4) The file basename.pid.hosts.txt contains the 
           summary of the job submitted. 
           Where basename is the basename of the file cmdfilename.rdb.
           and pid is the process id generated by the divconquer script.
        futile-1633: cat tmp_tst.hosts.txt 
        Start time at: Tue Nov 10 11:45:04 EST 1998
        Submit file tmp_tst.0.rdb to pelf at Tue Nov 10 11:45:05 EST 1998 
        Submit file tmp_tst.1.rdb to ymir at Tue Nov 10 11:45:05 EST 1998 
        Submit file tmp_tst.2.rdb to marple at Tue Nov 10 11:45:05 EST 1998 
        Finished processing at: Tue Nov 10 11:45:12 EST 1998