Running jobs on the NGS with Globus
Running jobs on the NGS with Globus
The aim of this tutorial is a walk through submitting programs (called jobs) to the NGS and retrieving the output. This tutorial does not require any knowledge of programming and you can download the programs used in it. All the jobs in this tutorial will be sent to the Oxford node (ngs.oerc.ox.ac.uk) to run. All commands (highlighted like this) should be entered in a terminal window on a Linux machine with Globus installed and configured (or using the GSI-SSH terminal to access such a machine). Throughout this document examples of output have been inserted in purple. Please be aware that these example outputs may show output being running on a different site.
- The first task is to just connect to the head node at Oxford using gsissh. This will ensure that your account is working as expected. Access the node by typing in the following command on a Linux machine with Globus installed:
gsissh -p 2222 ngs.oerc.ox.ac.uk
If you have logged on successfully then log off for now.
- The simplest command for job submission is globus-job-run. The minimum parameters used by this command are where to send the job and what the program to run is. Submit your first job with the command
globus-job-run ngs.oerc.ox.ac.uk /bin/hostname -f
grid-compute.oesc.ox.ac.uk is the head node (a node users can directly access) at Oxford.
[gcw@lab-07 gram]$ globus-job-run ngs.oerc.ox.ac.uk /bin/hostname -f ngs.oerc.ox.ac.ukSince you directly told globus-job-run to run on this node, the result should hardly surprise you.
- The next step is therefore to use a system of accessing all the nodes at Oxford. To do this submit the job to a job queue running on the head node. When a suitable node becomes available your job will be submitted to it from this queue. It is possible for multiple queues to exist at a single site so when you tell globus-job-run where to run you provide the name of the queue. This time run the command
globus-job-run ngs.oerc.ox.ac.uk/jobmanager-pbs /bin/hostname -f
where jobmanager-pbs is the name of the queue.
[gcw@lab-07 gram]$ globus-job-run ngs.oerc.ox.ac.uk/jobmanager-pbs /bin/hostname -f node25.beowulf.cluster - You may have noticed that globus-job-run waits for your job to complete before exiting. The standard output of the job is sent directly to your terminal. Whilst this is not a problem for very short and simple jobs, this is not a good system for long jobs. Long jobs need to be submitted and occasionally checked, and then the output may be retrieved, possibly many hours or days later. The solution to this is to use globus-job-submit, a command very similar to globus-job-run except that it outputs a unique identity (uid) string for your job and then exits. The job is still running and other commands exist for checking on the job, retrieving the job's output and cleaning up after the job. Enter the command
globus-job-submit ngs.oerc.ox.ac.uk/jobmanager-pbs /bin/hostname -f
All subsequent commands depend on this uid to identify the job. If you are not familiar with Linux use <Ctrl>-<insert> to copy highlighted text and <Shift>-<Insert> to paste. In the following commands replace <uid> with the uid of your job. To check on your job and find its status use the command
globus-job-status <uid>
Repeat this command every few seconds until your job has achieved the status of "Done". In this context "Done" means your job has finished as far as the grid middleware is concerned, it does not necessarily mean your job did what you expected it to do. Next retrieve the standard output with
globus-job-get-output <uid>
- and finally clean up any temporary files created by your job with
globus-job-clean <uid>
Answer "Y" when asked if you are sure.
[gcw@lab-07 gram]$ globus-job-submit ngs.oerc.ox.ac.uk/jobmanager-pbs /bin/hostname -f https://ngs.oerc.ox.ac.uk:64001/1415/1110129853/ [gcw@lab-07 gram]$ globus-job-status https://ngs.oerc.ox.ac.uk:64001/1415/1110129853/ DONE [gcw@lab-07 gram]$ globus-job-get-output https://ngs.oerc.ox.ac.uk:64001/1415/1110129853/ node25.beowulf.cluster [gcw@lab-07 gram]$ globus-job-clean https://ngs.oerc.ox.ac.uk:64001/1415/1110129853/ WARNING: Cleaning a job means: - Kill the job if it still running, and - Remove the cached output on the remote resource Are you sure you want to cleanup the job now (Y/N) ? Y Cleanup successful. - The examples so far have involved running a standard system program (hostname). The next stage is therefore to submit a simple custom program. The first program to use is "myjob.sh" which is included in the programs you downloaded (at the beginning of the tutorial). Install this program in your home directory on the Linux machine. This program just prints the present working directory, the hostname again and also all the environmental variables currently set. Run the following commands to view and run this script on your local machine
cd ~/gram cat myjob.sh ./myjob.sh [gcw@lab-07 gram]$ cat myjob.sh #!/bin/sh echo $PWD hostname -f env [gcw@lab-07 gram]$ ./myjob.sh /home/gcw/gram lab-07.nesc.ed.ac.uk MANPATH=/opt/globus/man::/opt/edg/share/man:/opt/lcg/share/man:/opt/edg/man HOSTNAME=lab-07.nesc.ed.ac.uk GRID_PROXY_FILE=/tmp/x509up_u501 LCG_LOCATION_VAR=/opt/lcg/var TERM=xtermBy default the globus-job-submit (and globus-job-run) assume that the program is local to the node the job will run on. The script "myjob.sh" only exists on the local machine and hence it is neccessary to send "myjob.sh" to the execute node as part of the job. This process is known as "staging". Run the job with the command (note the extra -s):
globus-job-submit ngs.oerc.ox.ac.uk/jobmanager-pbs -s ./myjob.sh
- The previous programs have only used standard output for displaying their results. Many programs will output to files as well as to standard output and hence the question is what happens to these files. Inspect the file myjob2.sh from the downloaded file gram_srb_practical_files.tgz. You will see that the environmental variables are now saved in a file called myenv.txt. Copy this file over to your home directory, run this job and get it's output (and clean up).
You will notice you still got the present working directory and hostname on the standard output. The file myenv.txt is actually written to your home directory on the head node. Rather than view this file by logging on to the head node, copy the file to your current directory instead:
gsiscp -P 2222 ngs.oerc.ox.ac.uk:myenv.txt .
You can now view the file myenv.txt.
- Returning to the topic of staging, if you are running the same job multiple times it is better if the program is stored somewhere accessible by a node running this job. Your account on the head node is the ideal place for this. Copy the file "myhostname.c" (which is a simple piece of C code that prints out the hostname) from the downloaded files gram_srb_practical_files.tgz to your home directory. Compile this code and initially submit this program as a job using staging. To compile this code use the command:
gcc myhostname.c -o myhostname
Now upload myhostname.c to the head node using gsiscp and compile (and test) it there. You will now be able to run the job without using staging. Since the program is sitting in the top level of your account on the head node you will not need to provide a path to the executable.
[gcw@lab-07 gram]$ gsiscp -P 2222 myhostname.c ngs.oerc.ox.ac.uk: [gcw@lab-07 gram]$ gsissh -p 2222 ngs.oerc.ox.ac.uk Last login: Sun Mar 6 18:18:11 2005 from lab-07.nesc.ed.ac.uk ClusterVision Red Hat Enterprise 3.0 on Intel distribution v0.9 ... [ngs0249@grid-compute ngs0249]$ gcc myhostname.c -o myhostname [ngs0249@grid-compute ngs0249]$ ./myhostname host is ngs.oerc.ox.ac.uk [ngs0249@grid-compute ngs0249]$ exit logout Connection to ngs.oerc.ox.ac.uk closed. - Try creating some jobs of your own. Alternatively you could explore the other options supported by the globus commands. All of the globus commands provide a detailed description of their usage by running the command with -help as the only command line parameter.
