Queuing Systems: OpenMosix
Short user guide:
Log on to ehtpx-pc5.
To submit a job just execute as normal and OpenMosix will automatically balance to load.
Two machines in pool: ehtpx-pc5 and ehtpx-pc4. Each has 4 CPU's.
Some commands: mtop - similar to top except for all machines in pool, mosmon - simple graphical interface displaying load on the machines.
MPI jobs: run lamboot on ehtpx-pc5 and then submit jobs as usual. OM will migrate the jobs to the other box if needed.
Queuing Systems: Condor
Hetrogeneous queuing system.
Support for Globus.
Architecture: Central manager, execution machines and submit only machines.
Central Manager: manages whole system, collecing info and negotiating with all machines.
Execution Machine: controlling daemons, runs submitted jobs.
Submit Only Machine: controlling daemons, runs shadow process of executable.
Can be obtained from: condor site.
Firstly, create a home directory for a user "condor" on your machine.
The installation script will create 3 directories in the condor home directory:
spool - this is used to hold queued jobs including executables, etc..
log - contains the log files
execute - working job directory - output files, executable, core files, etc..
After downloading, just run the condor_install script and follow the step-by-step instructions. Should be donw as root.
What type of condor installation do you want? - answer 'full-install' or 'submit-only' if you just want to be able to submit jobs to the system and not have your machine as one of the executing nodes.
How many machines are you setting up this way? - because we don't have a shared file system just answer 'no' to this.
Install the Condor release directory - choose a directory to put the condor stuff in.
E-mail setting: set this to whatever you like for now.
File system and UID domains .
Java Universe support 'Yes' if you can. Might be worth a look later.
Where should public programs be installed? ignore.
Which machine will be the central manager? use ehtpx-pc2.dl.ac.uk .
There might be a couple of other questions but they should be straight forward to answer.
To start the condor daemons running execute the command: (release directory)/sbin/condor_master.
The following processes should now be running:
As soon as it is working properly you can add the condor_master call to your startup script so that it will start at boot time.
To check the status of condor run the command condor_status.
There is an online Manual at http://www.cs.wisc.edu/condor/manual/v6.5.2/
Jobs are submitted using the condor_submit command. This takes as an argument the name of a file called a submit description file.
The submit description file contains informtation such as name of executable, initial working directory and command line arguments.
Condor creates a "ClassAd" based on this information.
Can have specifications for many jobs in one submit file. Must have same executable for each.
Submit file must have one executable command and at least one call to queue.
A simple example of a submission script:
# Example 1
# Simple Condor job submission script
Executable = exec
Universe = vanilla
input = input.data
output = output.out
Log = log.file
Choosing a Condor Universe. These define the execution environment. Seven Types: