Distributed Computing


Queuing Systems: OpenMosix

Short user guide:

  • Log on to ehtpx-pc5.
  • To submit a job just execute as normal and OpenMosix will automatically balance to load.
  • Two machines in pool: ehtpx-pc5 and ehtpx-pc4. Each has 4 CPU's.
  • Some commands: mtop - similar to top except for all machines in pool, mosmon - simple graphical interface displaying load on the machines.
  • MPI jobs: run lamboot on ehtpx-pc5 and then submit jobs as usual. OM will migrate the jobs to the other box if needed.

  • Queuing Systems: Condor

    Introduction:

  • Hetrogeneous queuing system.
  • Support for Globus.
  • Free.
  • Architecture: Central manager, execution machines and submit only machines.
  • Central Manager: manages whole system, collecing info and negotiating with all machines.
  • Execution Machine: controlling daemons, runs submitted jobs.
  • Submit Only Machine: controlling daemons, runs shadow process of executable.
  • Can be obtained from: condor site.
  • Installation:

  • Firstly, create a home directory for a user "condor" on your machine.
  • The installation script will create 3 directories in the condor home directory:

    spool - this is used to hold queued jobs including executables, etc..
    log - contains the log files
    execute - working job directory - output files, executable, core files, etc..

  • After downloading, just run the condor_install script and follow the step-by-step instructions. Should be donw as root.
  • What type of condor installation do you want? - answer 'full-install' or 'submit-only' if you just want to be able to submit jobs to the system and not have your machine as one of the executing nodes.
  • How many machines are you setting up this way? - because we don't have a shared file system just answer 'no' to this.
  • Install the Condor release directory - choose a directory to put the condor stuff in.
  • E-mail setting: set this to whatever you like for now.
  • File system and UID domains .
  • Java Universe support 'Yes' if you can. Might be worth a look later.
  • Where should public programs be installed? ignore.
  • Which machine will be the central manager? use ehtpx-pc2.dl.ac.uk .
  • There might be a couple of other questions but they should be straight forward to answer.
  • Running Condor:

  • To start the condor daemons running execute the command: (release directory)/sbin/condor_master.
  • The following processes should now be running:

    condor_master
    condor_startd
    condor_schedd

  • As soon as it is working properly you can add the condor_master call to your startup script so that it will start at boot time.
  • To check the status of condor run the command condor_status.
  • There is an online Manual at http://www.cs.wisc.edu/condor/manual/v6.5.2/
  • Submitting Jobs:

  • Jobs are submitted using the condor_submit command. This takes as an argument the name of a file called a submit description file.
  • The submit description file contains informtation such as name of executable, initial working directory and command line arguments.
  • Condor creates a "ClassAd" based on this information.
  • Can have specifications for many jobs in one submit file. Must have same executable for each.
  • Submit file must have one executable command and at least one call to queue.
  • A simple example of a submission script:

    #####################
    #
    # Example 1
    # Simple Condor job submission script
    #
    ####################

    Executable = exec
    Universe = vanilla
    input = input.data
    output = output.out
    Log = log.file
    Queue

  • Choosing a Condor Universe. These define the execution environment. Seven Types:

    Standard
    Vanilla
    PVM
    MPI
    Globus
    Java
    Scheduler