divconquer - A (no frills) script to submit jobs on various workstations on the network.
divconquer [-d stdoutstderrdir] [-v] [-n level] -s host1[,host2] -r job_db
divconquer [-d stdoutstderrdir] [-v] [-n level] -f host_db -r job_db
divconquer -h
divconquer -d /pool1 -r tst.rdb -s pelf,ymir,marple
divconquer -n 4 -r tst.rdb -s pelf,ymir,marple
# character are
ignored. Multiple invocations of the same host are permitted; this
will lead to multiple simultaneous jobs running on that host. See
Specifying Hosts.
divconquer is a simple script to submit jobs on various workstations on the network. The script divconquer is simple to use, as a result it does not have any code for load balancing. The script can only return once all the jobs are finished. The user must also specify the machines to be submitted. The script requires two arguments, the command file and the workstations to which to submit the jobs. The user must supply an RDB table containing the commands to be executed and the job ids.
The hosts upon which to run are specified with either the -s or
-f options. A host may be specified multiple times, causing
divconquer to submit that many simultaneous jobs.
To minimize the idle time of the host workstations, it is recommended that the number of workstations be divisible by the number of jobs. For example, if there are 12 jobs executed, it is best to submit 2, 3, 4, 6 or 12 hosts. The script tests whether a host machine can be reached by pinging the machine. The user may want to test this manually.
For example, to see if marple can be one of the host workstations (yes).
ennui-804: ping marple marple is alive
To see if dumbo1 can be one of the host workstations (no).
ennui-803: ping dumbo1 ping: unknown host dumbo1
Jobs are listed in the file which is specified by the required -r flag. The file is an RDB file containing at least two columns (do not include comment in the file), id and cmd. The column id and cmd contains, respectively, the unique job id and the command to be executed on the machine. A sample command file is given below:
ennui-771: cat tst.rdb
id cmd junk tmp
-- --- ---- ---
job1 ps -aux now is the
job2 date junk a
job3 who a a
job4 ls a a
Note that multiple command entries, seperated by `;', may be entered in the cmd column.
The standard output and error streams for each job are written to files with the format id.stdout and id.stderr, where id is the job id as specified in the command file. The files id.stdout or id.stderr are removed if they are empty.
1) A sample command file containing the commands to be executed:
futile-1629: cat tst.rdb
id cmd junk tmp
-- --- ---- ---
job1 ps -aux now is the
job2 date junk a
job3 who a a
job4 ls a a
... .. . .
job31 who a a
job32 ls a a
2) To submit the jobs on pelf ymir and marpe, use the following
command:
futile-1630: ./divconquer -r tst.rdb -s pelf,ymir,marple
32 files will be processed on 3 Machines. 11 records/machine
[futile] submit 'nice -8 conquer -r tmp_tst.0.rdb to pelf'
[futile] submit 'nice -8 conquer -r tmp_tst.1.rdb to ymir'
[futile] submit 'nice -8 conquer -r tmp_tst.2.rdb to marple'
[futile]: children=6186 6188 6190
3) The messages to stdout and stderr are trapped in the files:
futile-1631: ls *.stdout *.stderr
job1.stderr job17.stderr ....
job1.stdout job17.stdout ....
............ .................
job16.stderr job23.stderr ....
job16.stdout job23.stdout ....
4) The file basename.pid.hosts.txt contains the
summary of the job submitted.
Where basename is the basename of the file cmdfilename.rdb.
and pid is the process id generated by the divconquer script.
futile-1633: cat tmp_tst.hosts.txt
Start time at: Tue Nov 10 11:45:04 EST 1998
Submit file tmp_tst.0.rdb to pelf at Tue Nov 10 11:45:05 EST 1998
Submit file tmp_tst.1.rdb to ymir at Tue Nov 10 11:45:05 EST 1998
Submit file tmp_tst.2.rdb to marple at Tue Nov 10 11:45:05 EST 1998
Finished processing at: Tue Nov 10 11:45:12 EST 1998